sources 
Services 



ISSN 0024-2527 
January 2008 
Volume 52, No. 1 



Use of the Checklist Method for Content 
Evaluation of Full-text Databases 

Thomas E. Nisonger 

Mass Digitization 

Trudi Bellardo Hahn 

Converting and Preserving the Scholarly Record 

Jeffrey L. Horrell 

Who Has Published What on East Asian Studies? 

Su Chen and Chengzhi Wang 

Web Citation Availability 

Mary F. Casserly and Janes E. Bird 

An Operational Model for Library 
Metadata Maintenance 

Jim LeBlanc and Martin Kurth 




L 






The Association for Library Collections & Technical Services 



52 | 1 



From The Library of Congress 



For 30 days 




nnisfr nmwif * lint* 



2 Essential Cataloging & Classification 

Tools on the Web 





Catalog er's 
Desktop 



► Log On 



1 




CLASSIFICATION 

WEB 



CATALOGED DESKTOP CLASSIFICATION WEB 



The most widely used cataloging 
documentation resources in an 
integrated, online system — 
accessible anywhere. 

• Look up a rule in AACR2 and then quickly and easily 
consult the rule's LC Rule Interpretation (LCRI). 

• Turn to dozens of cataloging publications and metadata 
resource links plus the complete MARC 21 documentation. 

• Find what you need quickly with the enhanced, 
simplified user interface. 

Free trial accounts & 
annual subscription prices: 

Visit www.loc.gov/cds/desktop 

For free trial, complete the order form at 
www.loc.gov/cds/desktop/OrderForm.html 

AACR2 is the joint property of the American Library Association, the Canadian Library 
Association, the Chartered Institute of Library and Information Professionals. 



Full-text display of all LC classification 
schedules & subject headings. 
Updated daily 

• Find LC/Dewey correlations — Match LC classifica- 
tion and subject headings to Dewey® classification 
numbers as found in LC cataloging records. Use in 
conjunction with OCLC's WebDewey® service for 
perfect accuracy. 

• Search and navigate across all LC classes or the 
complete LC subject headings. 

Free trial accounts & 
annual subscription prices: 

Visit www.loc.gov/cds/classweb 

For free trial, complete the order form at 
www.loc.gov/cds/classweb/application.html 

Dewey and WebDewey are registered trademarks 
of OCLC Online Computer Library Center, Inc. 




Library of Congress | cataloging distribution service 

101 INDEPENDENCE AVENUE, ST., WASHINGTON, D.C. 20541-4912 U.S.A. 

TOLL-FREE PHONE IN U.S. 1-800-255-3666 | OUTSIDE U.S. CALL +1-202-707-6100 

FAX +1-202-707-1334 | WEBSITE: WWW.LOC.GOV/CDS | E-MAIL: CDSINFO@LOC.GOV 



Library Resources i? Technical Services (ISSN 
0024-2527) is published quarterly by the American 
library Association, 50 E. Huron St., Chicago, 
IL 60611. It is the official publication of the 
Association for Libraiy Collections &r Technical 
Services, a division of the American Library 
Association. Subscription price: to members of the 
Association for Libraiy Collections & Technical 
Services, $27.50 per year, included in the member- 
ship dues; to nonmembers, $75 per year in U.S., 
Canada, and Mexico, and $85 per year in other 
foreign countries. Single copies, $25. Periodical 
postage paid at Chicago, IL, and at additional mail- 
ing offices. POSTMASTER: Send address changes 
to Library Resources ir Technical Services, 50 E. 
Huron St., Chicago, IL 60611. Business Manager: 
Charles Wilt, Executive Director, Association 
for Library Collections & Technical Services, a 
division of the American Libraiy Association. 
Send manuscripts to the Editorial Office: Peggy 
Johnson, Editor, Library Resources 6- Technical 
Services, University of Minnesota Libraries, 499 
Wilson Library, 309 19th Ave. So., Minneapolis, 
MN 55455; (612) 624-2312; fax: (612) 626-9353; 
e-mail: m-john@umn.edu. Advertising: ACLTS, 50 
E. Huron St., Chicago, IL 60611; 312-280-5038; 
fax: 312-280-5032. ALA Production Services: Troy 
D. Linker, Angela Ilanshaw, and Chris Keech. 
Members: Address changes and inquiries should 
be sent to Membership Department — Library 
Resources 6~ Technical Services, 50 E. Huron 
St., Chicago, IL 60611. Nonmember subscribers: 
Subscriptions, orders, changes of address, and 
inquiries should be sent to Library Resources 
b Technical Services, Subscription Department, 
American Library Association, 50 E. Huron St., 
Chicago, IL 60611; 1-800-545-2433; fax: (312) 944- 
2641; subscription@ala.org. 

Library Resources br Technical Services is indexed in 
Library Literature, Library b Information Science 
Abstracts, Current Index to Jourtials in Education, 
Science Citation Index, and Information Science 
Abstracts. Contents are listed in CALL (Current 
American — Library Literature ) . Its reviews are 
included in Rook Review Digest, Rook Review 
Index, and Review of Reviews. 

Instructions for authors appear on the Library 
Resources b Technical Services Web page at www 
.ala.org/alcts/lrts. Copies of books for review should 
be addressed to Edward Swanson, Book Review 
Editor, Library Resources b Technical Services, 
1065 Portland Ave., Saint Paul, MN 55104; e-mail: 
swansl52@umn.edu. 

©2008 American Libraiy Association 

All materials in this journal subject to copyright by 
the American Libraiy Association may be photo- 
copied for the noncommercial purpose of scientific 
or educational advancement granted by Sections 
107 and 108 of the Copyright Revision Act of 1976. 
For other reprinting, photocopying, or translating, 
address requests to the ALA Office of Rights and 
Permissions, 50 E. Huron St., Chicago, IL 60611. 

The paper used in this publication meets the mini- 
mum requirements of American National Standard 
for Information Sciences — Permanence of Paper for 
Printed Library Materials, ANSI Z39.48-1992. °° 

Publication in Library Resources br Technical 
Services does not imply official endorsement by 
the Association for Libraiy Collections & Technical 
Services nor by ALA, and the assumption of edito- 
rial responsibility is not to be construed as endorse- 
ment of the opinions expressed by the editor or 
individual contributors. 



Library Resources 
Technical Services^ 



January 2008 



Volume 52, No. 1 



ISSN 0024-2527 



Editorial 



ARTICLES 

Use of the Checklist Method for Content Evaluation 

of Full-text Databases 4 

An Investigation of Two Databases Based on Citations 
from Two Journals 
Thomas E. Nisonger 

Mass Digitization 1 8 

Implications for Preserving the Scholarly Record 
Trudl Bellardo Hahn 

Converting and Preserving the Scholarly Record 27 

An Overview 
Jeffrey L. Horrell 

Who Has Published What on East Asian Studies? 33 

An Analysis of Publishers and Publishing Trends 
Su Chen and Chengzhi Wang 

Web Citation Availability 42 

A Follow-up Study 

Mary F. Casserly and James E. Bird 

An Operational Model for Library Metadata Maintenance 54 

Jim LeBlanc and Marfin Kurfh 



NOTES ON OPERATIONS 

Determining the Average Cost of a Book for 

Allocation Formulas 60 

Comparing Options 

Virginia Kay Williams and June Schmidf 

Index to Advertisers 53 
Book Review 71 



Cover image courtesy of morgueRle (www.morguefile.com). 



Association for Libraiy Collections & Technical Services 
Visit LETS online at www.ala.org/alcts/lrts. 

For current news and reports on ALCTS activities, see the ALCTS Newsletter Online at 
www.ala.org/alcts/alcts_news. 



2 



LRTS 52(1) 



EDITORIAL BOARD 

Editor and Chair 
Peggy Johnson 
University of Minnesota 

Members 

Kristen Antehnan, North Carolina 

State University 

Stephen Bosch, University of 

Arizona 

Yvonne Carignan, University of 
Maryland 

Mary Casserly, University at Albany 

Elisa Coghlan, University of 
Washington 

Tsehera Harkness Connell, Ohio 
State University 

Magda A. El-Sherbini, Ohio State 
University 

Karla L. Hahn, Association of 
Research Libraries 

Dawn Hale, Johns Hopkins 
University 

Sara C. Heitshu, University of 
Arizona 

Judy Jeng, New Jersey City 
University (Intern) 

Shirley J. Lineieum, Western 
Oregon University 

Bonnie MacEwan, Auburn 
University 

Carolynne Myall, Eastern 
Washington University 

Pat Riva, Bibliotheque et Archives 
nationales du Quebec 

Diane Vizine-Goetz, OCLC, Inc. 

Ex-Officio Members 

Charles Wilt, Executive Director, 
ALCTS 

Mary Beth Weber, Rutgers 
University, Editor, ALCTS 
Newsletter Online 

Edward Swanson, MINITEX 
Library Information Network, 
Book Review Editor, LRTS 



Editorial 

Peggy Johnson 

This first issue of 2008 is another terrific collection of 
papers addressing the rapidly changing environment in 
which we work. Thomas E. Nisonger returns to a familiar 
analysis tool, the checklist method, which was developed in a 
print environment, and evaluates the full text, indexing, and 
abstracting coverage of two databases (Library Literature 
and Information Science Full Text and EBSCOhost Academic 
Search Premier). Nisonger compares citations to journal 
articles that were published in Library Resources {? Technical Services and 
Collection Building. He concludes by identifying areas for future research. 

We are delighted to publish two papers based on presentations given at 
the Eighth Annual Symposium on Scholarly Communication, "Converting 
and Preserving the Scholarly Record," held at State University of New York, 
Albany, October 24, 2006. Trudi Bellardo Hahn's paper, "Mass Digitization: 
Implications for Preserving the Scholarly Record," looks at the intersection (or 
not) of libraries' interests and those of commercial entities in terms of qual- 
ity, secrecy, and long-term stability. She issues a call for the library profession 
to exercise strong leadership in how best to preserve the scholarly record. 
Jeffrey L. Horrell, in "Converting and Preserving the Scholarly Record: An 
Overview," which was delivered at the same symposium, explores pertinent 
aspects of the challenge and concludes with recommended elements for a 
campuswide digital repository 

Su Chen and Chengzhi Wang consider scholarly and publishing trends in 
Western-language monographs in East Asian studies from 2000 through 2005. 
Their findings demonstrate increased activity and interest in this area as publish- 
ers and academia pay more attention to China, Japan, and Korea. 

How persistent is a URL? May F. Casserly and James E. Bird seek to answer 
this question through statistical analysis and identification of citation character- 
istics associated with availability. Building on research the authors conducted 
in 2002, they report that the overall availability of Web content in the sample 
dropped from 89.2 percent to 80.6 percent. 

Jim LeBlanc and Martin Kurth propose an operational model for maintain- 
ing library metadata. They begin by noting that few libraries devote the same 
level of attention and resources to maintaining non-MARC metadata as they 
devote to MARC records and to the traditional catalog. The model that LeBlanc 
and Kurth suggest builds on the idea that the expertise and skills that guide 
catalog data curation can be applied to metadata maintenance in a broader set of 
information delivery systems. 

This issue's Notes on Operations piece by Virginia Kay Williams and June 
Schmidt investigates methods of determining average prices used in allocation 
formulas and discusses the advantages and disadvantages of different approach- 
es — drawing on data from Mississippi State University Libraries, the Bowker 
Annual, previous acquisition cost data, Blackwell Price Reports, and Blackwell 
approval plan profiles. 




Is managing your e-journal collection 
more difficult than you expected? 



We can help 




From obtaining an accurate list of the titles you've ordered to handling 
registration issues to troubleshooting access problems and more, 
managing electronic collections can require more time and attention 
than you have to give. 



EBSCO's services include e-journal audits to confirm that your library 
is billed only for the titles ordered, itemized invoices to facilitate budget 
allocation and customized serials management reports to assist with 
collection development. 

Our dedicated e-journal customer service team assists with non-access 
problems, IP address changes and more. And our suite of e-resource 
access and management tools minimizes administrative tasks while 
maximizing patron experience. 

Let us put our expertise to work for you. Contact your EBSCO sales 
representative today. 



EBSCO 



WWW.ebsCO.COrn information services 



4 



LRTS 52(1) 



Thomas E. Nisonger (nisonge@indiana. 
edu) is Professor, Indiana University, 
School of Library and Information 
Science, Bloomington. 

The author gratefully acknowledges his 
graduate assistants in Indiana University's 
School of Library and Information 
Science, Sara Franks, Catherine Hall, and 
Suzanne Switzer, who assisted in a variety 
of ways, including checking citations in 
the databases and helping tabulate 
the results. 



Submitted January 14, 2007; tentatively 
accepted pending revision March 11, 
2007; revised and resubmitted April 6, 
2007, and accepted for publication. 



Use of the Checklist 
Method for Content 
Evaluation of Full-text 
Databases 

An Investigation of Two Databases 
Based on Citations from Two 
Journals 

By Thomas E. Nisonger 

Following a detailed (but not comprehensive) review of the use of citation data as 
checklists for library collection evaluation, the use of this technique for evaluat- 
ing database content is explained. This paper reports an investigation of the full- 
text and indexing and abstracting coverage of Library Literature & Information 
Science Full Text and EBSCOhost Academic Search Premier, based on checking 
citations to journal articles in the 2004 volumes of Library Resources & Technical 
Services and Collection Building. Analysis of these citations shows they were 
predominately to English-language library and information science journals pub- 
lished in the United States, with the majority dating from 2000 to 2004. Library 
Literature & Information Science Full Text contained 21.1 percent of the citations 
in full-text format, while the corresponding figure for Academic Search Premier 
was 16.1 percent. The database coverage also is analyzed by publication date, 
country of origin, and Library of Congress classification number of cited items. 
Some limitations to the study are acknowledged, while issues for future research 
are outlined. 

That the librarianship paradigm is rapidly changing with the evolution from 
a print to an electronic environment is almost a cliche. Relatively new for- 
mats, such as full-text databases, electronic journals, electronic books, and the 
Web, offer numerous challenges to contemporary librarians, including a need for 
evaluation techniques. While a host of generally accepted collection evaluation 
methods were developed for the twentieth century's relatively stable, mostly print 
environment, identifying appropriate evaluation methodologies ranks among the 
library profession's major challenges in the first decade of the twenty-first century. 
As will be illustrated in the following literature review, the checklist method, dat- 
ing to the mid-nineteenth century, is one of the oldest and among the most often 
used approaches to library collection evaluation. This paper's purpose is to dem- 
onstrate the use of a citation-based checklist approach by evaluating the content 
of two full-text databases: Library Literature ir Information Science Full Text and 
Academic Search Premier. 



52(1) LRTS 



Use of the Checklist Method for Content Evaluation of Full-text Databases 5 



The Guide to the Evaluation of Library Collections 
offers a succinct definition of the checklist approach: "With 
this procedure the evaluator selects lists of titles or works 
appropriate to the subjects collected, to the programs or 
goals of the library, or to the programs and goals of consor- 
tia. These lists are then searched in the library files to deter- 
mine the percentage the library has in its own collection." 1 
More specifically, the lists are checked in the library's cata- 
log (originally a card catalog, now an online public access 
catalog [OPAC]). 

The benefits and drawbacks associated with the check- 
list technique have been discussed in the literature by 
Lockett, Lundin, and the author, among others. 2 On the 
positive side, lists can be compiled to meet the needs of a 
particular library or type of library and they can be exam- 
ined to increase knowledge of the literature. Lists also are 
straightforward to implement, require little subject exper- 
tise, and provide objective data that is easily understood. On 
the negative side, the collection might hold other resources 
better than those on the list; all items on the list are not of 
equal value; appropriate lists might be difficult to locate; 
held items might not be available because they are checked 
out, missing, or for other reasons; and many lists focusing on 
a single subject area do not consider resources from other 
disciplines. One of the more compelling criticisms is the fact 
that the checklist approach was developed to test ownership 
in the traditional model of librarianship and usually does 
not consider items obtained on interlibrary loan or licensed 
electronically. 

History of the Checklist Method 

According to Mosher and other authorities, the earli- 
est reported collection evaluation in an American library, 
published in 1849, used the checklist method. 3 That inves- 
tigation, written by the Smithsonian Institution's assistant 
secretary Charles Coffin Jewett, used the citations in leading 
mid-nineteenth century textbooks in chemistry, commerce, 
ethnography, and international law as the checklist and 
concluded that North American libraries were inadequate 
compared to their European counterparts. 4 

A major collection evaluation at the University of 
Chicago during the early 1930s relied upon the checklist 
method. As part of an ambitious collection-building project 
led by M. Llewellyn Raney, more than four hundred bib- 
liographies were checked by approximately two hundred 
faculty members resulting in a multimillion dollar desid- 
erata list. 5 In the mid- 1930s, Waples and Lasswell used 
the checklist approach to evaluate select social science 
areas in six major American research libraries, including 
the Library of Congress (LC), Harvard, and the New York 
Public Library. 6 Two checklist collection evaluations pub- 



lished during the 1960s have sometimes been termed classic 
studies: Coale's evaluation of the Newberry Library's Latin 
American Colonial history holdings along with comparative 
data for the University of Texas at Austin, the University of 
California at Berkeley Libraries, and the Hispanic Society 
of America, and Webb's assessment of medieval studies, art 
history, political science, physics, Slavic studies, and United 
States and United Kingdom social and literary history at the 
University of Colorado. 7 

The checklist technique (sometimes in combination 
with other approaches) also has been used for the evalu- 
ation of library holdings in science and technology at the 
University of Idaho by Burns; the periodicals collection 
at James Madison University by Bolgiano and King; his- 
tory of Christianity at Ohio State University by Shiels and 
Alt; music at Louisiana State University by Taranto and 
Perrault; irrigation at the University of Illinois by Porta 
and Lancaster; biocatalysis and applied molecular biology 
at Columbia University by Kehoe and Stein; theatre arts at 
the University of California at Sacramento by Snow; math- 
ematics at Winona State University by Dennison; the legal 
collection at Suffolk University Law Library by Flaherty; 
and graphic novels at the University of Memphis (although 
specific results are not reported) by Matz. 8 In addition, the 
method was used by Larson to test the accuracy and consis- 
tency of assigned Conspectus collection levels in French lit- 
erature by twenty Research Libraries Group (RLG) libraries 
in a Conspectus verification study. 9 Note that this paragraph 
does not contain a comprehensive listing as numerous other 
examples could be cited. 

Citation-based Checklists 

Most of the earliest checklist evaluations used bibliogra- 
phies, recommended lists, or other so-called "authorita- 
tive" sources. Yet the Guide to the Evaluation of Library 
Collections outlines fifteen possible sources for a checklist, 
such as course syllabi or reading lists, publisher or dealer 
catalogs, bestseller lists, the holdings of important librar- 
ies, and so on. 10 Two of the fifteen relate to citations: lists 
of highly cited journals, such as those in the Institute for 
Scientific Information's Journal Citation Reports (JCR) and 
"citations contained in publications." 11 

Citation analysis is a well-established library and infor- 
mation science research methodology that is frequently 
used to analyze scholarly communications patterns as well 
as for numerous evaluative purposes. Citations selected 
from journals, textbooks, dissertations and theses, faculty 
publications, and other sources have frequently been used 
as collection evaluation checklists. The advantages and 
disadvantages of using citations for checklists have been 
reviewed by this author. 12 The technique is based on the 



6 Nisonger 



LRTS 52(1) 



assumption that the cited sources were used by researchers, 
and thus should be contained in a library collection support- 
ing research. Relevant interdisciplinary or multidisciplinary 
citations might be included that would not appear on other 
lists specific to a particular subject. Among the disadvan- 
tages, some citations may be peripheral to the topic, the 
technique focuses on library patrons who publish, and an 
item might be cited simply because it is available rather than 
because it is the best resource. 

Heidenwolf asserts that the use of citations for check- 
lists originated during the 1950s and cites a 1957 study 
by Emerson. 13 She categorizes Jewett's well-known 1849 
evaluation, described above, as an example of checking an 
"authoritative bibliography," but Jewett's study also was a 
citation-based checklist, as it used references from text- 
books. 14 The most frequently used methods for selecting 
citations for checklist evaluation will be reviewed below. 
Note that illustrative examples are provided for each cat- 
egory rather than a comprehensive review. 

Citations from Journals 

This researcher used two methods for selecting citations 
from political science journals (the first based on three 
years of the American Political Science Review and the 
second based on one year of five other journals) to evalu- 
ate the political science collections of George Washington, 
Georgetown, Howard, Catholic, and George Mason uni- 
versity libraries. 13 Utilizing the author's second method, 
Heidenwolf used citations from five epidemiology journals 
to evaluate the epidemiology collection in the University of 
Michigan Library system and its Public Health Library. 16 
Gleason and Deffenbaugh selected citations from three 
biblical studies journals to evaluate the University of Notre 
Dame Library's holdings on that topic. 17 In addition to 
other methods, Crawley-Low evaluated the University 
of Saskatchewan's toxicology collection by using citations 
to books from a three-year run of the Annual Review of 
Pharmacology and Toxicology as a checklist. 18 Journal cita- 
tions also were used in a checklist evaluation of irrigation 
at the University of Illinois at Urbana-Champaign by Porta 
and Lancaster. 19 

Citations from Textbooks 

This is the approach used by Jewett in the 1840s. 20 Among 
numerous methods employed in the evaluation of the 
Washington University School of Medicine's ophthalmol- 
ogy monograph collection, Gallagher used the one hundred 
monographic citations in the classic textbook Ophthalmology: 
Principles and Concepts as a checklist to address the ques- 
tion whether the book could have been written with the 
library's resources. 21 In a similar vein, Watson selected 



citations from Duane's Clinical Ophthalmology along with 
another non-citation source. 22 Her checklists were used by 
members of the Association of Vision Science Librarians 
to evaluate their collections with results for twenty-one 
unidentified libraries reported. 

Bland checked citations from twenty-five textbooks 
(five each in mathematics, philosophy, physics, psychology, 
and sociology) against the holdings of the Western Carolina 
University Library, predicated on the assumption that the 
collection's relevance for teaching purposes would be tested 
because the citations were taken from textbooks for courses 
taught in the curriculum. 23 Following up on Bland's work, 
Stelk and Lancaster checked citations from five religious 
studies textbooks against the holdings of the University of 
Illinois at Urbana-Champaign undergraduate and main uni- 
versity libraries, and confirmed the technique's usefulness 
for evaluation of undergraduate collections. 24 In another 
permutation on the use of textbooks, Currie selected one 
citation from the textbook for each of eighty courses taught 
at Firelands College, a two year-branch of Bowling Green 
State University, to create a checklist for evaluating both the 
branch and the main library. 25 

Citations from Dissertations and Theses 

Citations from dissertations and theses have been used as a 
checklist to test the ability of university libraries to support 
doctoral- and masters-level research. In the earliest known 
study, Emerson analyzed the citations in twenty-three engi- 
neering dissertations completed at Columbia University 
from 1950 through 1954, and then used them as a checklist 
to evaluate the Columbia Libraries engineering holdings. 26 
Herubel used a list of the journals and serials cited twice 
in philosophy dissertations written at Purdue University as 
a checklist for evaluating the Purdue library's periodical col- 
lection. 27 The University of California at Irvine's library was 
evaluated by Buzzard and New based on a checklist of cita- 
tions selected from thirty-six dissertations (twelve each from 
the sciences, social sciences, and humanities) completed at 
that institution. 28 Citations from sixty-five master's theses in 
human resources development were used by Moulden to 
evaluate the National College of Education's ability to sup- 
port off-campus programs. 29 

Citations from Faculty publications 

The selection of citations from dissertations as well as facul- 
ty-authored books and articles at Loughborough University 
in the United Kingdom has been reported by Lewis in a 
study that also examined interlibrary loan (ILL) records 
to determine if unheld items had been borrowed. 30 To test 
the capability of the Pennsylvania State University's branch 
campus libraries to support faculty research, Neal and Smith 



52(1) LRTS 



Use of the Checklist Method for Content Evaluation of Full-text Databases 7 



checked citations from journal articles published by branch 
faculty against the system holdings. 31 Haas and Lee evalu- 
ated the University of Florida library's periodical holdings in 
forestry by checking journal titles cited in faculty publica- 
tions as well as articles written by faculty. 32 



The Lopez Method 

Although infrequently used, the Lopez method offers an 
interesting variation on citations as checklist technique that 
is worth noting. Lopez described an evaluation method, 
developed at the State University of New York at Buffalo 
Library, that extends the checklist technique through 
four hierarchical levels. 33 He explain this approach in the 
following: 

Select at random from a critical bibliography, a 
number of references. Check these references 
against the library's holdings. If those references 
are available, then take as your second reference, 
the first citation in that publication's footnote. 
Repeat the procedure until either the library lacks 
the material cited or until a fourth and final citation 
is obtained. 34 

Lopez then outlined a 10-20-40-80 scoring method 
for items held at levels one through four respectively. 35 
This researcher reported a test of Lopez's method at the 
University of Manitoba Library in four subject areas (family 
therapy, the American novel, modern British history, and 
Medieval French literature) that concluded that the method 
measured a collection's depth for supporting research, but 
was unreliable because of inconsistent results between the 
two different tests in each subject. 36 

Use of Checklists in Database Evaluation 

Ever since so-called full-text databases emerged during 
the 1980s, the completeness of their coverage has been 
debated and, to some extent, researched. One of the earlier 
investigations of full-text database content, published by 
Pagell in 1987, bore the provocative and catchy-sounding 
subtitle, "How Full Is Full?" 3 ' A variety of methods have 
been used or proposed to assess full-text database content 
coverage and quality, including Pagell's comparison of print 
issues with database coverage; Black's average JCR impact 
factor for journals contained in the database (that also were 
covered by JCR); and Jacso's summing of the impact factor 
of all of a database's JCR journals. 38 

The checklist method also has been used to evaluate 
indexing and abstracting coverage, full- text content of data- 
bases, or both. In this modification of the traditional check- 
list approach, each item on the list is checked against the 



database under evaluation rather than in a library's OPAC. 
Typically, a list of journal titles is checked against the ven- 
dor's list of titles theoretically contained in the database. For 
example, Carr and Wolfe used core lists of education and 
biology journals to evaluate four electronic databases at the 
University of Wisconsin system libraries. 39 At the University 
of Hawaii at Manoa, Brier and Lebbin used Magazines for 
Libraries as a checklist to evaluate the title coverage of three 
databases. 40 Black used the list of journals covered in JCR to 
evaluate four full-text databases. 41 Jacobs, Woodfield, and 
Morris compiled core journal lists, based on local citations 
by British researchers, that were checked against the cover- 
age of four major databases as well as the British Library 
Document Supply Centre. 42 Instead of checking titles, 
Grzeszkiewicz and Hawbaker checked the articles from 
sample issues of journals subscribed to by the University of 
the Pacific Library in Business Index ASAP. 43 This literature 
review identified only two published cases in which citations 
were directly checked in databases — the method used in this 
study. Tyler, Boudreau, and Leach selected 6,170 citations 
from the first available 2000 issue of an unnamed number of 
core communication studies journals and checked them for 
coverage in three communication studies indexes and five 
multidisciplinary databases. 44 Schaffer used a sample of 368 
citations from more than 150 articles published by psychol- 
ogy department faculty at Texas A & M University between 
2000 and 2002 as a checklist (although that term is not used 
by Schaffer) for evaluating the content of twenty-six elec- 
tronic full-text databases licensed by the library. 45 



Databases Evaluated in this Investigation 

Library Literature ir Information Science Full Text, pub- 
lished by H. W. Wilson, contains "full text of articles from 
nearly 150 journals as far back as 1997" and indexing cover- 
age for four hundred journals dating to 1984. 46 Although 
this database has undergone name changes and migration 
from print to CD-ROM to a Web-interface, it can be traced 
to Library Literature, the well-known library science index 
originally published in print format by the American Library 
Association in 1921. 4 ' This product was chosen for evalu- 
ation because of its pedigree and reputation as a premier 
library and information science database. 

Part of the EBSCOHOST suite of databases mar- 
keted by EBSCO, Academic Search Premier is advertised as 
"designed specifically for academic institutions" and offering 
"the world's largest, multidisciplinary full-text database." 48 
This product contains full text for "nearly" 4,650 serials, 
with backfiles as far back as 1975 "or further" for more than 
one hundred journals; furthermore, indexing and abstracts 
are provided for 8,200 titles. 49 This service was selected for 
investigation because it is an important multidisciplinary 



8 Nisonger 



LRTS 52(1) 



database that includes library and information science with 
an academic focus. 

One might ask why compare a specialized full-text 
database with a general one (rather than two specialized 
or two general databases), and does not such a comparison 
unfairly advantage the former when based on citations from 
its discipline? Library Literature 6- Information Science 
Full Text is the only full-text database specific to that disci- 
pline, as LISA: Library and Information Science Abstracts 
and Library, Information Science & Technology Abstracts 
are not advertised as full-text services. Academic Search 
Premier, although a multidisciplinary database, is known to 
have significant library and information science content and 
is actually listed under "library and information science" in 
the "Databases by Subject" menu selection on the Indiana 
University library Web page. 50 While a better performance 
by Literature and Information Science Full Text would be 
presumed, it is useful to gather empirical evidence to test 
this assumption and to examine the differences in the two 
databases' coverage. At the project's conclusion, the results 
from the two databases were similar enough to suggest it 
was not unreasonable to compare them. 

Procedures 

The citations to periodicals in the 2004 bibliographical vol- 
umes of Library Resources 6- Technical Services (LRTS), 
volume 48, and Collection Building, volume 23, served as 
the source for this investigation. All citations in endnotes 
(referred to as "references" in both journals) or appended 
in "further reading" or bibliography sections were consecu- 
tively numbered, classified by format, and entered into an 
Excel spreadsheet. Citations were counted according to the 
item-to-item link approach developed by Garfield and used 
in the Institute for Scientific Information's Web of Science. 
Thus, if a specific bibliographic item was cited twice in one 
article, it was counted as only one citation, but if cited in two 
different articles it counted as two citations. A small number 
of nonbibliographical items (editor inquiries to the author 
included as numbered footnotes apparently in error) were 
disregarded. 

The cited periodical titles were checked in the OCLC 
WorldCat database to verify their subject (based on Library 
of Congress classification number) and country of publica- 
tion. During the spring 2005 semester, the citations to peri- 
odicals were checked (by author and, if not found, by title) 
in two databases: Library and Information Science Full Text 
and Academic Search Premier. Each checked periodical 
citation was initially classified into one of four categories: 

1. a citation only 

2. a citation plus an abstract 



3. a full-text entry 

4. no record in the database 

An Excel spreadsheet was used to calculate the overall 
periodical coverage for LRTS and Collection Building in 
both databases; in other words, the distribution of the jour- 
nal's citations to periodicals among the four categories out- 
lined above. For purposes of final analysis, categories one 
and two were combined into a single indexing and abstract- 
ing coverage category. The spreadsheet also was used to 
tabulate the results by title and by publication date of the 
cited articles, facilitating analysis by those variables. Note 
that analysis by language, subject, and place of publication 
did not require a spreadsheet. 

Analysis of the Citations 

The 2004 LRTS contained 910 citations, counted according 
to the method described in the preceding section. Table 1 
presents a breakdown of these citations by format. A major- 
ity of the citations (60.0 percent) were to periodical articles, 
while books were the second most frequently cited format 
(12.4 percent). If the citations for books and book chapters 
(3.9 percent) are combined, 16.3 percent (calculated from 
the raw data rather than by adding percentages) of the cita- 
tions were to monographs. The Web accounted for 11.7 
percent of the citations: 10.7 percent to Web documents and 
1.0 percent to Web sites. 

Table 2's summary of journals cited in LRTS shows 
that LRTS itself was the most frequently cited title, with 
its 43 citations accounting for 7.9 percent of the 546 
total. The ten most cited journals (those cited 22 times 
or more) accounted for more than half the citations (52.0 
percent). Yet a total of 115 different titles were cited, 
with 62 cited only once, 14 twice, 9 cited three times, 
and 9 cited four times. In counting titles, a title change 
is considered a different title (following the policy of the 
Institute for Scientific Information). Accordingly, Library 
Acquisitions: Practice ir Theory and its later title, Library 
Collections, Acquisitions ir Technical Services, are listed 
separately. 

The 2004 volume of Collection Building contained 
256 citations. Table 3 indicates that journal articles were 
the most frequently cited format (41.8 percent), although 
they accounted for a smaller proportion of citations than in 
LRTS. In contrast to LRTS, where monographs were the 
second most frequently cited format, Web sites (18.4 per- 
cent) and Web documents (9.8 percent) accounted for 28.1 
percent (calculated from raw data rather than by adding 
percentages) of the Collection Building citations, whereas 
books (19.9 percent) and book chapters (3.1 percent) com- 
prised 23.0 percent of the citations. 



52(1) LRTS 



Use of the Checklist Method for Content Evaluation of Full-text Databases 9 



The journals cited in Collection Building are displayed 
in table 4. The two most frequently cited journals, Collection 
Building itself and Library Trench, both cited 11 times, con- 
tributed 20.6 percent of the 107 journal citations. The top 
8 journals (contributing three or more citations) accounted 
for 43.0 percent of journal citations, while 45 titles were 
cited only once and 8 titles twice. Altogether, 61 different 
titles were cited in Collection Building. The listing of titles 
in tables 2 and 4 should definitely not be interpreted as a 
formal journal ranking. Rather, these titles are presented in 
order to provide information about the citations used as the 
basis for the evaluation. 

The periodical citation data for both LRTS and Collection 
Building conform to two well-known patterns: journal self- 
citation and the law of concentration and scatter. For a 
variety of reasons (the explanations for which are beyond 
this paper's scope), journals tend to cite themselves. Citation 
and use studies typically display a pattern of concentration in 
a small number of highly cited or used journals and scatter 
among a large number of infrequently cited or used titles. 

The publication dates of the periodical articles cited in 
LRTS ranged from 1964 through 2004, while the periodical 
articles cited by Collection Building were published from 
1981 to 2004. Table 5 summarizes cited periodical publica- 



Table 1 . Format of items cited in LRTS in 2004 



Format 


No. 


% 


Journal articles 


546 


60.0 


Books 


113 


12.4 


Web documents 


97 


10.7 


Conferences 


50 


5.5 


Book chapters 


35 


3.8 


Government documents 


22 


2.4 


E-mail 


11 


1.2 


Web sites 


9 


1.0 


Zines 


7 


0.8 


Spec kits 


5 


0.5 


CD-ROM 


4 


0.4 


Master's theses 


4 


0.4 


Internal documents 


3 


0.3 


Private conversations 


2 


0.2 


Electronic discussion list 


1 


0.1 


Video recording 


1 


0.1 


Total 


910 


99.8 



tion dates, organized into five-year intervals, for the two 
journals. It is striking that for both journals the majority of 
cited periodicals were published since 2000 and thus within 
the most recent five years: 53.8 percent for LRTS and 57.9 
percent for Collection Building. Only a small fraction of 
both journals' periodical citations predate 1990 (8.4 percent 
in LRTS and 2.8 percent in Collection Building), probably 
reflecting the rapid changes within the field. 



Table 2. Summary of journals cited in LRTS 



Title (N=115) 


Times 
cited 


% 


Cumulative 

% 


Library Resources & Technical Services 


43 


7.9 


7.9 


Collection Management 


42 


7.7 


15.6 


College & Research Libraries 


33 


6.0 


21.6 


Cataloging & Classification Quarterly 


27 


4.9 


26.6 


Against the Grain 


24 


4.4 




Library Trends 


24 


4.4 


35.3 


Collection Building 


23 


4.2 




Journal of Library Administration 


23 


4.2 




Library Collections, Acquisitions & 
Technical Services 


23 


4.2 


48.0 


Information Technology & Libraries 


22 


4.0 


52.0 


Serials Librarian 


18 


3.3 


55.3 


Serials Review 


14 


2.6 


57.9 


Journal of Academic Librarianship 


13 


2.4 


60.3 


Acquisitions Librarian 


12 


2.2 


62.5 


American Libraries 


10 


1.8 


64.3 


Library Hi Tech 


9 


1.6 


65.9 


Library Acquisitions: Practice & Theory 


8 


1.5 


67.4 


Library Quarterly 


7 


1.3 


68.7 


ARL: A Bimonthly Report 


6 


1.1 




D-Lib Magazine 


6 


1.1 




Portal: Libraries & the Academy 


6 


1.1 


72.0 


9 titles 


4 


6.6 


78.6 


9 titles 


3 


4.9 


83.5 


14 titles 


2 


5.1 


88.6 


62 titles 


1 


11.4 


100 


Total 


546 


99.9 





Note. Total does not add to 100% due to rounding. 



Notes. Cumulative percentages were calculated from the raw data, rather 
than by adding percentages. Total does not add to 100% due to rounding. 



10 Nisonger 



LRTS 52(1) 



Table 3. Format of items cited in Collection Building in 2004 


Format 


NO. 


0/ 
la 


Journal articles 


107 


41.8 


Books 


51 


19.9 


Web sites 


47 


18.4 


Web documents 


ZD 


o 


Book chapters 


8 


3.1 


Conferences 


8 


3.1 


Compact discs 


6 


2.3 


Ph.D. dissertation 


1 


0.4 


Internal document 


1 


0.4 


Masters thesis 


1 


0.4 


Newspaper article 


1 


0.4 


Total 


256 


100 



Table 4. Summary of journals cited in Collection Building in 2004 
Title (A/=61) Times cited 

Collection Building 1 1 

Library Trends 1 1 

Collection Management 5 

Library Collections, Acquisitions & Technical Services 5 

Serials Review 5 

Journal of Academic Librarianship 3 

Public Libraries 3 

Publishers Weekly 3 

Acquisitions Librarian 2 

Against the Grain 2 

Booklist 2 

Malaysian Journal of Library & Information Science 2 

Online Information Review 2 

Reference & Users Service Quarterly 2 

Rural Libraries 2 



% Cumulative % 



20.6 



Serials Librarian 
45 titles 
Total 



2 
1 

107 



Notes. Cumulative percentages were calculated from the raw data, rather than 
Total does not add to 100% due to rounding. 



Most citations were to journals published in the United 
States. In LRTS, 90.1 percent of the 546 citations were to 
United States journals, followed by 7.1 percent to journals 
published in the United Kingdom, and 1.1 percent to 
German journals. Four countries each received fewer than 
1 percent of the LRTS citations: Denmark (0.9 percent), 
Australia (0.4 percent), Canada (0.2 percent), and the 
Netherlands (0.2 percent). The international citation rate 
was somewhat higher in Collection Building, where 73.8 per- 
cent of 107 citations were to United States journals and 17.8 
percent to United Kingdom titles. There was a smattering 
of citations to seven other countries. Nigeria and Malaysia 
each received 1.9 percent of the Collection Building journal 
citations, and five countries received 0.9 percent (1 cita- 
tion only): Australia, India, Germany, Netherlands, and the 
Philippines. The journal citations were almost exclusively in 
English, exceeding 99 percent in both journals. One citation 
in Collection Building was in Spanish, while two citations in 
LRTS were in German. 

As noted in the preceding section, WorldCat was 
consulted to determine the LC clas- 
sification number for the cited titles. 
Not unexpectedly, a strong major- 
ity of citations in both LRTS and 
10.3 Collection Building were to journals 

classified in Z (Bibliography, Library 
Science, Information Resources 
47 [General]). In LRTS, 93.0 percent 

of the 546 journals citations were 
to Z-classified titles; specifically 
4.7 34.6 89.0 percent to "Libraries" (Z662 to 

28 Z1000.5); 2.0 percent to "General 

Bibliography" (Z1001 to Z1121); 1,5 
percent to "Information Resources 
(General)" (ZA); and 0.5 percent to 
"Book Industries and Trade" (Z116 
to Z659). In addition to Z, 2.9 per- 
19 cent of LRTS citations were to P 

(Language and Literature), while 
seven other broad classes were rep- 
resented: L (Education) — 0.9 per- 
cent; Q (Science) — 0.9 percent; 
1-9 T (Technology) — 0,5 percent; C 

(Auxiliary Sciences of History) — 0.4 
percent; H (Social Sciences) — 0.4 
1.9 percent; A (General Works) — 0.2 

percent and M (Music and Books on 
Music) — 0.2 percent. Classification 
42.1 100 numbers were not available for 0.5 

100.4 percent of the citations. 

^^^^^^^^^^^^ For Collection Building, 82.2 

by adding percentages. percent of the 107 journal citations 

were to Z, including 72.0 percent to 



10.3 



4.7 



2.8 

2.8 
1.9 



1.9 



1.9 



1.9 



1.9 



43.0 



56.9 



52(1) LRTS 



Use of the Checklist Method for Content Evaluation of Full-text Databases 1 1 



"Libraries," 9.3 percent to "General Bibliography," and 0.9 
percent to "Information Resources (General)." In addition, 
5.6 percent were to P; 3.7 percent to H; 2.8 percent to Q; 
1.9 percent to E (American and U.S. history (other than 
local); 1.9 percent to T; and 0.9 percent to L. A classifica- 
tion number could not be determined for 0.9 percent of 
the citations. 



Results of Checking the Databases 

Overall Results 

The results of checking the citations in the two databases 
under investigation are tabulated in table 6. In terms of full- 
text entries, the best result, not unexpectedly, was found in 
Library and Information Science Full Text, which contained 
full text for 21.1 percent of the total citations for the two 
journals. Academic Search Premier held 16.1 percent of 
the total citations in full-text format. While this article's 
primary focus is on the comparison of databases rather than 
journals, it is noteworthy that Collection Building receives 
equal full-text coverage from Library and Information 

Table 5. Analysis of periodical citations by publication date 

Library Resources & Technical Services 

Years No. % 

2000-2004 294 53.8 

1995-1999 159 29.1 

1990-1994 47 8.6 

1985-1989 29 5.3 

1980-1984 8 1.5 

Pre-1980 9 1.6 

Total 546 99.9 

Collection Building 

Years No. % 

2000-2004 62 57.9 

1995-1999 29 27.1 

1990-1994 13 12.1 

1985-1989 2 1.9 

1980-1984 1 0.9 

Pre-1980 0.0 

Total 107 99.9 

Note. Totals do not add to 100% due to rounding. 



Science Full Text and Academic Search Premier, as both 
provided 25.2 percent of its periodical citations in that form. 
In contrast, LRTS's full text coverage was higher in Library 
and Information Science Full Text (20.3 percent) than in 
Academic Search Premier (14.3 percent). 

Apart from the issue of full-text coverage, Library 
Literature & Information Science Full Text contained some 
type of record (full text or indexing and abstracting cover- 
age) for a much higher proportion of the citations than 
did Academic Search Premier. For illustration, Academic 
Search Premier had no coverage for 65.6 percent of LRTS 
and 55.1 percent of Collection Building citations (63.9 per- 
cent for both), whereas Library Literature <b Information 
Science Full Text covered all but 17.4 percent of LRTS's 
citations and 24.3 percent of those in Collection Building 
(18.5 percent in the two journals). With a small number 
of exceptions, Library Literature i? Information Science 
Full Text listed only a citation, whereas Academic Search 
Premier contained both a citation and abstract. This para- 
graph's data show that when full-text entries and indexing 
or abstracting coverage are collectively considered, LRTS 
received fuller coverage than did Collection Building in 
Library Literature i? Information Science Full Text, while 
Collection Buildings coverage was better in Academic 
Search Premier. 

To provide some comparative results from similar stud- 
ies, Tyler, Boudreau, and Leach found that indexing cover- 
age in three communication studies indexes ranged from 
25.0 percent to 34.1 percent and from 4.0 percent to 77.8 
percent in five other multisubject databases. 51 Schaffer dis- 
covered that "less than one-third" of the cited articles in his 
study were available in full text in at least one of the twenty- 
six online databases licensed by Texas A&M. 52 

Finally, because this investigation is based on check- 
ing the entire universe of periodical citations in LRTS 
and Collection Building during 2004 rather than random 
samples, the use of statistical significance tests would be 
inappropriate. 

Results by Publication Date 

Table 7 analyzes the results by publication date. One can 
observe that items published during the 2000-2004 inter- 
val received the highest proportion of full-text coverage in 
both databases and for both journals, with the percentages 
covered consistently declining in a perfect linear relation- 
ship for the two earlier five-year intervals (1995-1999 and 
1990-1994). For items published before 1990, only two 
(LRTS citations covered in Academic Search Premier) 
received full-text coverage. When one examines the table's 
final column (which indicates the percentage of titles not 
covered in the database as either full-text or indexing or 
abstracting entries), the proportions usually increase for the 



12 Nisonger 



LRTS 52(1) 



older intervals, although a direct linear relationship does 
not exist. Thus, one can conclude that coverage is generally 
higher for more recent citations, as would be expected. 

Results by Country of Publication and Language 

Analysis of full-text coverage by country of publication found 
a primary focus on the United States. For Academic Search 
Premier, 77 of the 78 LRTS citations contained in full text 
were published in the United States, and 1 was published 
in the Netherlands, while 26 of the 27 full-text Collection 
Building entries were published in the United States, and 

1 in the United Kingdom. All 111 of the LRTS full-text 
items in Library Literature ir Information Science Full Text 
were published in the United States, as were 22 of the 27 
Collection Building full-text entries in that database, with 

2 published in Malaysia and 1 each in Australia, India, and 
Germany. A similar but less strong pro-United States bias 
was found in the indexing or abstracting coverage for the 
cited items. In the Academic Search Premier database, 106 
of 110 LRTS indexing or abstracting entries were published 
in the United States, with 4 in the United Kingdom, while 
for Collection Building 20 of 21 originated in the United 
States, and 1 in the United Kingdom. In Library Literature 
ir Information Science Full Text, 295 of 340 non-full-text 
entries (in other words, indexing or abstracting entries) from 
LRTS came from United States-published journals, whereas 
35 were published in the United Kingdom, 4 in Denmark, 

3 in Germany, 2 in Australia, and 1 in the Netherlands. 
For Collection Building, 40 of 54 non-full text entries were 



United States-published, with 13 in the United Kingdom, 
and 1 in the Netherlands. 

Both databases held full-text entries for a significantly 
larger proportion of the citations published in the United 
States than those published outside the country. Academic 
Search Premier contained 15.7 percent (77 of 492) of LRTS 
citations published in the United States, versus 1.9 percent 
(1 of 54) of those from other countries. For Collection 
Building, 32.9 percent (26 of 79) of United States pub- 
lications were held in full text, contrasted to 3.6 percent 
(1 of 28) for non-United States publications. In Library 
Literature if Information Science Full Text, the percentage 
for United States versus non-United States publications was 
22.6 percent (111 of 492), contrasted with percent (0 of 
54) for LRTS, and 27.8 percent (22 of 79) compared with 
17.9 percent (5 of 28) for Collection Building. 

When limited to indexing or abstracting entries, a 
stronger coverage for the United States also was observed 
in Academic Search Premier but not Library Literature if 
Information Science Full Text. The former database held 
citations or abstracts for 21.5 percent (106 of 492) of the 
United States publications in LRTS, and 25.3 percent (20 of 
79) of the United States publications in Collection Building, 
although it only held 7.4 percent (4 of 54) of LRTS's non- 
United States citations and 3.6 percent (1 of 28) of Collection 
Building's. In contrast, Library Literature if Information 
Science Full Text contained non-full-text entries for a larger 
proportion of LRTS citations published outside the United 
States than inside the United States: 83.3 percent (45 of 54) 
versus 60.0 percent (295 of 492). For Collection Building, 



Table 6. Results from searching LRTS and Collection Building citations in two databases 



LRTS 

Collection Building 
Total 



No. 

546 
107 

653 



Full-text entry 

No. % 

78 14.3 
27 25.2 
105 16.1 



Academic Search Premier 
Indexed/abstracted 

No. % 

110 20.1 

21 19.6 

131 20.1 



Not covered 
No. % 

358 65.6 
59 55.1 
417 63.9 



Library Literature & Information Science Full Text 
Full-text entry Indexed/abstracted Not covered 

No. No. % No. % No. % 

LRTS 546 111 20.3 340 62.3 95 17.4 

Collection Building 107 27 25.2 54 50.5 26 24.3 

Total 653 138 21.1 394 60.3 121 18.5 



52(1) LRTS 



Use of the Checklist Method for Content Evaluation of Full-text Databases 13 



Table 7. Analysis of searching results by publication date 



Academic Search Premier— citations from LRTS 

Full-text entry Indexed/abstracted Not covered 



Publication Date 


No. 


No. 


% 


No. 


% 


No. 


% 


2000-2004 


294 


47 


16.0 


65 


22.1 


182 


61.9 


1995-1999 


159 


24 


15.1 


42 


26.4 


93 


58.5 


1990-1994 


47 


5 


10.6 


3 


6.4 


39 


83.0 


1985-1989 


29 














29 


100 


1980-1984 


8 














8 


100 


Pre- 1980 


9 


2 


22.2 








7 


77.8 


Total 


546 


78 


14.3 


110 


20.1 


358 


65.6 


Academic Search Premier- 


-citations from Collection Building 
















Full-text entry 


Indexed/abstracted 


Not covered 


Publication Date 


No. 


No. 


% 


No. 


% 


No. 


% 


2000-2004 


62 


20 


32.3 


12 


19.4 


30 


48.4 


1995-1999 


29 


6 


20.7 


5 


17.2 


18 


62.1 


1990-1994 


13 


1 


7.7 


4 


30.8 


8 


61.5 


1985-1989 


2 














2 


100 


1980-1984 


1 














1 


100 


Total 


107 


27 


25.2 


21 


19.6 


59 


55.1 


Library & Information Science Full Text— citations from LRTS 
















Full-text entry 


Indexed/abstracted 


Not covered 


Publication Date 


No. 


No. 


% 


No. 


% 


No. 


% 


2000-2004 


294 


82 


27.9 


172 


58.5 


40 


13.6 


1995-1999 


159 


29 


18.2 


103 


64.8 


27 


17.0 


1990-1994 


47 








36 


76.6 


11 


23.4 


1985-1989 


29 








27 


93.1 


2 


6.9 


1980-1984 


8 








1 


12.5 


7 


87.5 


Pre- 1980 


9 








1 


11.1 


8 


88.9 


Total 


546 


111 


20.3 


340 


62.3 


95 


17.4 


Library & Information Science Full Text— citations from Collection Building 














Full-text entry 


Indexed/abstracted 


Not covered 


Publication Date 


No. 


No. 


% 


No. 


% 


No. 


% 


2000-2004 


62 


24 


38.7 


22 


35.5 


16 


25.8 


1995-1999 


29 


3 


10.3 


20 


69.0 


6 


20.7 


1990-1994 


13 








11 


84.6 


2 


15.4 


1985-1989 


2 








1 


50.0 


1 


50 


1980-1984 


1 














1 


100 


Total 


107 


27 


25.2 


54 


50.5 


26 


24.3 



14 Nisonger 



LRTS 52(1) 



the percentages were almost identical: 50.6 percent (40 of 
79) for United States publications, and 50.0 percent (14 of 
28) for non-United States publications. 

These data suggest that Library Literature ir 
Information Science Full Text has a stronger international 
coverage, as far as indexing or abstracting entries are con- 
cerned, than Academic Search Premier. Regarding language, 
there was no record in either database for the one Spanish 
citation in Collection Building or the two German citations 
in LPiTS, although these numbers are obviously too small to 
allow conclusions about language coverage. 

Results by LC Classification 

A breakdown by classification number revealed that most of 
the entries in both databases were classed in the Z segment 
for "Libraries" (Z662-Z1000.5). In the Academic Search 
Premier database, 54 of the 78 LRTS full-text entries were 
classified there, while 13 were classed in P, 4 in Q, 3 in L, 2 
in T, 1 in H, and 1 in M. Also, 101 of 110 LRTS indexing or 
abstracting entries were classified in the "Libraries" range 
of Z, 4 in Z's "General Bibliography" segment, 2 in L, 2 in P, 
and 1 in Q. Of the 27 full-text Collection Building entries in 
Academic Search Premier, 15 were classed in Z's "Libraries" 
range, 5 in P, 4 in Z's "General Bibliography," 1 in E, 1 in 
H, and 1 in T. For Collection Buildings 21 non-full-text 
items, 15 were in Z's "Libraries" range, 3 in Z's "General 
Bibliography," 1 in H, 1 in P, and 1 in Q. 

Classification analysis of the Library Literature if 
Information Science Full Text database discovered that 109 
of the 111 LRTS full-text entries fell in Z's "Libraries" range, 
with 1 in L, and 1 in M. For the 340 indexing or abstract- 
ing entries, 310 were in the Z range for "Libraries," 14 in 
P 9 in "General Bibliography in Z, 5 in ZA, and 2 in C. Of 
the 27 Collection Building full-text entries, 23 were in Z's 
"Libraries" section, 3 in Z's "General Bibliography," and 1 in 
T. For non-full-text entries, 43 of 54 fell in the "Libraries" 
range of Z, 5 in P, 4 in Z's "General Bibliography," 1 in ZA, 
and 1 in H. 

Further analysis revealed that Academic Search Premier 
was more likely to contain a full-text entry if the citation 
were classed outside the Z range for "Libraries." The data- 
base held full-text entries for 11.1 percent (54 of 486) of the 
LRTS citations classed in Z's "Libraries" section, contrasted 
to 40.0 percent (24 of 60) for all the remaining citations 
classed elsewhere. For Collection Building in Academic 
Search Premier, 19.5 percent (15 of 77) of the citations in 
Z's "Libraries" range were held in full text, whereas 40.0 
percent (12 of 30) of the citations classified elsewhere were 
found in full text. However, this pattern did not hold up for 
indexing or abstracting coverage. Academic Search Premier 
contained non-full-text entries for 20.8 percent (101 of 
486) of the LRTS citations classed in "Libraries," compared 



to 15.0 percent (9 of 60) for the citations classed outside 
Z's "Libraries." Furthermore, 19.5 percent (15 of 77) of 
Collection Buildings citations classed under "Libraries," 
were included in the database as indexing or abstracting 
entries — a percentage almost identical to the 20.0 percent 
(6 of 30) of the citations classed elsewhere that were indexed 
or abstracted. 

In contrast to Academic Search Premier, Library 
Literature if Information Science Full Text held for the two 
journals in both full-text and indexing or abstracting form a 
higher proportion of citations classed in Z's "Libraries" range 
than those classified elsewhere. For LRTS, it contained 22.4 
percent of the former (109 of 486), contrasted with 3.3 per- 
cent (2 of 60) of the later in full text, and 63.8 percent of the 
former (310 of 486), contrasted with 50.0 percent (30 of 60) 
of the later as indexing or abstracting entries. For Collection 
Building, the corresponding data were 29.9 percent (23 of 
77), contrasted with 13.3 percent (4 of 30) for full text, and 
55.8 percent (43 of 77) versus 36.7 percent (11 of 30) for 
indexing or abstracting. 

Results by Title 

Another approach is to analyze database coverage by title 
rather than by citation, as has been done in the preceding 
sections. In Academic Search Premier, of the 115 titles cited 
in LRTS, 15 titles (13.0 percent) received full-text cover- 
age for all citations, 10 titles (8.7 percent) received partial 
full-text coverage (in other words, some but not all citations 
were in full text), 7 (6.1 percent) received full indexing or 
abstracting coverage, 7 titles (6.1 percent) received partial 
indexing or abstracting coverage, and 76 titles received no 
coverage (66.1 percent). For the 61 titles cited in Collection 
Building, 10 (16.4 percent) received complete full-text 
coverage, 2 (3.3 percent) partial full-text coverage, 11 (18.0 
percent) complete indexing or abstracting coverage, and 
38 (62.3 percent) no coverage. For Library Literature ir 
Information Science Full Text, the coverage for the 115 titles 
cited in LRTS was: complete full-text coverage — 11 (9.6 
percent); partial full-text coverage — 11 (9.6 percent); com- 
plete indexing or abstracting coverage — 41 (35.7 percent); 
partial indexing or abstracting coverage — 10 (8.7 percent), 
and no coverage — 42 (36.5 percent). The corresponding 
data for the 61 Collection Building titles stands at: com- 
plete full-text coverage — 11 (18.0 percent); partial full-text 
coverage — 3 (4.9 percent); complete indexing or abstract- 
ing coverage — 22 (36.1 percent); and no coverage — 25 
(41.0 percent). 

Because it is highly skewed by the large number of titles 
cited only once, this breakdown by title is less useful than 
analysis by citation. Yet it offers the benefit, unlike some 
previous checklist evaluations of database coverage, of dem- 
onstrating incomplete or mixed coverage for some titles. For 



52(1) LRTS 



Use of the Checklist Method for Content Evaluation of Full-text Databases 15 



example, of the 33 citations to College ir Research Libraries 
in LRTS, Library Literature ir Information Science Full 
Text provided full text for 20, indexing or abstracting for 10, 
and no coverage for 3. (Note that this was counted as partial 
full-text coverage in the preceding title analysis.) 

Limitations to the Study 

A number of limitations to this investigation are acknowl- 
edged. One would not expect to find many of the citations 
in LRTS and Collection Building in a library and information 
science database such as Library Literature ir Information 
Science Full Text because they are from other disciplines. 
Because database content frequently changes, this investi- 
gation's results represent a snapshot as of spring 2005. Also, 
items not held in the three databases under investigation 
might have been available in others licensed by the library, 
such as Science Direct. Finally, coverage is only one among 
many factors in database evaluation, along with pricing 
structure, licensing terms, search features, screen display, 
accuracy of records, compatibility with technological infra- 
structure, and others. 

Summary and Conclusions 

A citation-based checklist addresses the extent to which a 
collection or database would meet the needs of researchers. 
This research shows that both databases provide full-text 
entries for only a fraction of the articles cited by LRTS and 
Collection Building authors in 2004. Thus, the answer to 
that catchy-sounding question, "How full is full?" is, in this 
instance, "not very full." However, one should acknowl- 
edge that neither database provider claims complete full- 
text coverage. 

Library Literature ir Information Science Full Text is 
somewhat better than Academic Search Premier for full- 
text coverage (containing 21.1 percent versus 16.1 percent 
of the citations), but far more likely to contain an indexing 
or abstracting record of a cited item — all but 18.5 percent 
were found there, compared to 63.9 percent in Academic 
Search Premier. The latter finding would logically imply that 
Library Literature ir Information Science Full Text is clearly 
the preferred database for users interested in identifying 
existing resources on a library and information science topic 
even though they may not have immediate access through 
the database. 

Generally, full text and indexing and abstracting cover- 
age is stronger for more current citations and declines with 
citation age. The fact that more than half the periodical 
citations in LRTS and Collection Building date from 2000 
or later, with fewer than 10 percent predating 1990, suggests 



that, at least in reference to these two journals, deep back- 
runs in a database covering library and information science 
may not be of vital importance. 

The full-text coverage of both databases is highly 
skewed toward United States publications, as measured by 
the percentage of United States versus non-United States 
citations held and the origin of those citations actually held 
in full text. Academic Search Premiers indexing or abstract- 
ing coverage also is skewed towards the United States, but, 
somewhat surprisingly, Library Literature ir Information 
Science Full Text provides indexing or abstracting for a high- 
er proportion of non-United States than United States cita- 
tions in LRTS, and essentially equal coverage for Collection 
Building. Consequently, in terms of overall international 
coverage, Library Literature ir Information Science Full 
Text performs better than Academic Search Premier. 

Analysis by the LC classification system, which serves 
as a proxy for the cited item's subject, shows that Library 
Literature ir Information Science Full Text provides bet- 
ter full-text and indexing or abstracting coverage for items 
classed in Z's "Libraries" section than for those classed else- 
where. Academic Search Premier provides stronger full-text 
coverage for citations classed outside Z's "Libraries" seg- 
ment, although that pattern does not hold for its indexing or 
abstracting coverage. In the final analysis, Academic Search 
Premier provides better overall coverage for items outside 
traditional librarianship than does Library Literature ir 
Information Science Full Text — an expected finding, as 
Academic Search Premier advertises itself as a multidisci- 
plinary database. 

The author makes no explicit recommendation con- 
cerning whether a library should license either, both, or 
neither of the databases investigated here. Such a licensing 
decision would incorporate a variety of additional factors, 
such as budgetary considerations, collecting priorities, cur- 
ricular and teaching needs, researcher interests, and other 
databases already licensed, that would vary from library 
to library. 

This research is significant because it serves as further 
evidence that the citation-based checklist technique can be 
adapted to database content evaluation. A detailed analysis 
of coverage by such critical parameters as publication date, 
county of origin, and subject is offered. Moreover, the 
study represents the first known application of the tech- 
nique to database evaluation for the field of library and 
information science. 

Some questions for future research should be men- 
tioned. What proportion of the items would be available in 
other electronic databases licensed by the library, elsewhere 
on the Web through open-access journals or author self- 
archiving, or in the library's print collection? How often 
would patrons wanting a specific item successfully locate it 
in full-text form in the two databases under evaluation? As 



16 Nisonger 



LRTS 52(1) 



database content is commonly believed to be unstable, what 
results would be obtained by searching the same databases 
at later points for longitudinal comparison? 

References 

1. Barbara Lockett, Guide to the Evaluation of Library 
Collections (Chicago: ALA, 1989), 5. 

2. Ibid., 6; Anne H. Lundin, "List-Checking in Collection 
Development: An Imprecise Art," Collection Management 11, 
no. 3/4 (1989): 103-12; Thomas E. Nisonger, "A Test of Two 
Citation Checking Techniques for Evaluating Political Science 
Collections in University Libraries," Library Resources &■ 
Technical Services 27, no. 2 (Apr./June 1983): 163-76. 

3. Paul H. Mosher, "Quality and Library Collections: New 
Directions in Research and Practice in Collection Evaluation," 
in Advances in Librarianship, vol. 13, ed. Wesley Simonton, 
211-38 (Orlando, Fla.: Academic Pr., 1984). 

4. C. C. Jewett, "Report of the Assistant Secretary Relative 
to the Library, Presented December 13, 1848," in Third 
Annual Report of the Roard of Regents of the Smithsonian 
Institution to the Senate and House of Representatives, 39-47 
(Washington, D.C.: Tippin and Streeper, 1849). 

5. Paul H. Mosher, "Collection Evaluation in Research Libraries: 
The Search for Quality, Consistency, and System in Collection 
Development," Library Resources ir Technical Services 23, 
no. 1 (Winter 1979): 16-32. The original report, M. Llewellyn 
Raney, The University Libraries (Chicago: Univ. of Chicago 
Pr., 1933), was unavailable to the author of this article. 

6. Douglas Waples and Harold D. Lasswell, National Libraries 
and Foreign Scholarship: Notes on Recent Selections in Social 
Science (Chicago: Univ. of Chicago Pr, 1936). 

7. Robert Peerling Coale, "Evaluation of a Research Library 
Collection: Latin- American Colonial History at the Newberry," 
Library Quarterly 35, no. 3 (July 1965): 173-84; William 
Webb, "Project CoEd: A University Library Collection 
Evaluation and Development Program," Library Resources 
& Technical Services 13, no. 4 (Fall 1969): 457-62. 

8. Robert W. Burns Jr., Evaluation of the Holdings in Science/ 
Technology in the University of Idaho Library (Moscow, 
Idaho: Univ. of Idaho Library, 1968); Christina E. Bolgiano 
and Mary Kathryn King, "Profiling a Periodicals Collection," 
College ir Research Libraries 39, no. 1 (Jan. 1978): 99-104; 
Richard D. Shiels and Martha S. Alt, "Library Materials 
on the History of Christianity at Ohio State University: An 
Assessment," Collection Management 7, no. 2 (Summer 1985): 
69-81; Cheryl Taranto and Anna H. Perrault, "An Evaluation 
of the Music Collection at Louisiana State University," LLA 
Bulletin 51, no. 2 (Fall 1988): 89-92; Maria A. Porta and 
F. Wilfrid Lancaster, "Evaluation of a Scholarly Collection 
in a Specific Subject Area by Bibliographic Checking: A 
Comparison of Sources," Libri 38, no. 2 (June 1988): 131-37; 
Kathleen Kehoe and Elida B. Stein, "Collection Assessment 
of Biotechnology Literature," Science ir Technology Libraries 
9, no. 3 (Spring 1989): 47-55; Marina Snow, "Theatre Arts 
Collection Assessment," Collection Management 12, no. 3/4 
(1990): 69-89; Russell F. Dennison, "Quality Assessment of 
Collection Development Through Tiered Checklists: Can 



You Prove You Are a Good Collection Developer?" Collection 
Building 19, no. 1 (2000): 24-26; Brian Flaherty, "Assessing 
Legal Collections: Trying to Eke Out a Method from the 
Madness," Against the Grain 14, no. 1 (Feb. 2002): 66-68, 
70; Chris Matz, "Collecting Comic Books for an Academic 
Library," Collection Building 23, no. 2 (2004): 131-37. 

9. Jeffry Larson, "The RLG Conspectus French Literature 
Collection Assessment Project," Collection Management 6 
(Spring/Summer 1984): 97-114. 

10. Lockett, Guide to the Evaluation of Library Collections. 

11. Ibid., 6. 

12. Nisonger, "A Test of Two Citation Checking Techniques 
for Evaluating Political Science Collections in University 
Libraries." 

13. Terese Heidenwolf, "Evaluating an Interdisciplinary Research 
Collection," Collection Management 18, no. 3/4 (1994): 34; 
William L. Emerson, "Adequacy of Engineering Resources 
for Doctoral Research in a University Library," College it 
Research Libraries 18, no. 6 (Nov. 1957): 455-60, 504. 

14. Jewett, "Report of the Assistant Secretary Relative to the 
Library, Presented December 13, 1848." 

15. Nisonger, "A Test of Two Citation Checking Techniques 
for Evaluating Political Science Collections in University 
Libraries." 

16. Heidenwolf, "Evaluating an Interdisciplinary Research 
Collection," 33-48. 

17. Maureen L. Gleason and James T. Deffenbaugh, "Searching 
the Scriptures: A Citation Study in the Literature of Biblical 
Studies: Report and Commentary," Collection Management 6, 
no. 3/4 (Fall/Winter 1984): 107-17. 

18. Jill V. Crawley-Low, "Collection Analysis Techniques Used to 
Evaluate a Graduate-Level Toxicology Collection," Journal 
of the Medical Library Association 90, no. 3 (July 2002): 
310-16. 

19. Porta and Lancaster, "Evaluation of a Scholarly Collection in 
a Specific Subject Area by Bibliographic Checking." 

20. Jewett, "Report of the Assistant Secretary Relative to the 
Library, Presented December 13, 1848." 

21. Kathy E. Gallagher, "The Application of Selected Evaluative 
Measures to the Library's Monographic Ophthalmology 
Collection," Bulletin of the Medical Library Association 
69, no. 1 (Jan. 1981): 36-39; F. W. Newell, Ophthalmology: 
Principles and Concepts, 4th ed. (St. Louis: Mosby 1978). 

22. Maureen Martin Watson, "The Association of Vision 
Science Librarians' Citation Analysis of Duane's Clinical 
Ophthalmology," Journal of the Medical Library Association 
91, no. 1 (Jan. 2003): 83-85; Thomas David Duane, 
William Tasman, and Edward A. Jaeger, Duane's Clinical 
Ophthalmology, rev. ed. (Philadelphia: Lippincott Williams 
and Wilkins, 1998). 

23. Robert N. Bland, "The College Textbook As a Tool for 
Collection Evaluation, Analysis, and Retrospective Collection 
Development," Library Acquisitions: Practice <Lr Theory 4, 
no. 3/4 (1980): 193-97. 

24. Ibid.; Roger Edward Stelk and F. Wilfrid Lancaster, "The Use 
of Textbooks in Evaluating the Collection of an Undergraduate 
Library," Library Acquisitions: Practice Theory 14, no. 2 
(1990): 191-93. 



52(1) LRTS 



Use of the Checklist Method for Content Evaluation of Full-text Databases 1 7 



25. William W. Currie, "Evaluating the Collection of a Two-Year 
Branch Campus by Using Textbook Citations," Community & 
Junior College Libraries 6, no. 2 (1989): 75-79. 

26. Emerson, "Adequacy of Engineering Resources for Doctoral 
Research in a University Library." 

27. Jean-Pierre V. M. Herubel, "Simple Citation Analysis 
and the Purdue History Periodical Collection," Indiana 
Libraries 9, no. 2 (1990): 18-21; Jean-Pierre V. M. Herubel, 
"Philosophy Dissertation Bibliographies and Citations in 
Serials Evaluation," Seriak Librarian 20, no. 2/3 (1991): 
65-73. 

28. Marion L. Buzzard and Doris E. New, "An Investigation 
of Collection Support for Doctoral Research," College ir 
Research Libraries 44, no. 6 (Nov. 1983): 469-75. 

29. Carol M. Moulden, "Evaluation of Library Collection Support 
for an Off-Campus Degree Program," in The Off-Campus 
Library Services Conference Proceedings; Charleston, South 
Carolina, October 20-21, 1988, ed. Barton M. Lessin, 340-16 
(Mount Pleasant, Mich: Central Michigan Univ., 1989). 

30. D. E. Lewis, "A Comparison between Library Holdings and 
Citations," Library & Information Research News 11, no. 43 
(Autumn 1988): 18-23. 

31. James G. Neal and Barbara J. Smith, "Library Support of 
Faculty Research at the Branch Campuses of a Multi-campus 
University," Journal of Academic Librarianship 9, no. 5 (Nov. 
1983): 276-80. 

32. Stephanie C. Haas and Kate Lee, "Research Journal Usage by 
the Forestry Faculty at the University of Florida, Gainesville," 
Collection Ruilding 11, no. 2 (1991): 23-25. 

33. Manual D. Lopez, "A Guide for Beginning Bibliographers," 
Library Resources & Technical Services 13, no. 4 (Fall 1969): 
462-70. 

34. Ibid., 469-70. 

35. Ibid. 

36. Ibid.; Thomas E. Nisonger, "An In-Depth Collection 
Evaluation at the University of Manitoba Library: A Test of 
the Lopez Method," Library Resources & Technical Services 
24, no. 4 (Fall 1980): 329-38. 

37. Ruth Pagell, "Searching Full-Text Periodicals: How Full Is 
Full?" Database 10, no. 5 (Oct. 1987): 33-36. 

38. Ibid.; Steve Black, "An Assessment of Social Sciences 
Coverage by Four Prominent Full-Text Online Aggregated 
Journal Packages," Library Collections, Acquisitions, <Lr 
Technical Services 23, no. 4 (Winter 1999): 411-19; Peter 
Jacso, "Evaluating the Journal Base of Databases Using 
the Impact Factor of the ISI Journal Citation Reports," in 
National Online Meeting Proceedings 2000: Proceedings of the 



21st National Online Meeting, New York, May 16-18, 2000, 
ed. Martha E. Williams, 169-72 (Medford, N.J.: Information 
Today, 2000). 

39. Jo Ann Carr and Amy Wolfe, "Core Journal Titles in Full-Text 
Databases," in Racing Toward Tonwrrow. Proceedings of the 
Ninth National Conference of the Association of College and 
Research Libraries April 8-11, 1999, ed. Hugh A. Thompson, 
234-41 (Chicago: ACRL, 1999). 

40. David J. Brier and Vickery Kaye Lebbin, "Evaluating Title 
Coverage of Full-Text Periodical Databases," Journal of 
Academic Librarianship 25, no. 6 (Nov. 1999): 473-78; Bill 
Katz and Linda Sternberg Katz, Magazines for Libraries, 9th 
ed. (New York: Bowker, 1997). 

41. Black, "An Assessment of Social Sciences Coverage by Four 
Prominent Full-Text Online Aggregated Journal Packages." 

42. N. Jacobs, J. Woodfield, and A. Morris, "Using Local Citation 
Data to Relate the Use of Journal Articles by Academic 
Researchers to the Coverage of Full-Text Document Access 
Systems," Journal of Documentation 56, no. 5 (Sept. 2000): 
563-81. 

43. Anna Grzeszkiewicz and A. Craig Hawbaker, "Investigating a 
Full-Text Journal Database: A Case of Detection," Database 
19, no. 6 (Dec. 1996): 59-62. 

44. David C. Tyler, Signe O. Boudreau, and Susan M. Leach, "The 
Communication Studies Researcher and the Communication 
Studies Indexes," Rehavioral ir Social Sciences Librarian 23, 
no. 2 (2005): 119-46. 

45. Thomas Schaffer, "Psychology Citations Revisited: Behavioral 
Research in the Age of Electronic Resources," Journal of 
Academic Librarianship 30, no. 5 (Sept. 2004): 354—60. 

46. H. W. Wilson, "Library and Information Science Full Text," 
wvvw.hwwilson.com/databases/liblit.htm (accessed Sept. 20, 
2006). 

47. Indiana University, Herman B Wells Library, "IUCAT" 
[online public access catalog], www.iucat.iu.edu/authenticate 
.cgi?status=start (accessed Jan. 12, 2007). 

48. EBSCOHOST Web, "Academic Search Premier," http://web 
.ebscohost.com/ehost/selectdb?hid=lll&sid=159646b9 
-7f69-4b2a-9cec-8dbcb2fd82cf%40sessionmgrl02 (accessed 
Sept. 20, 2006). 

49. Ibid. 

50. Indiana University, Herman B Wells Library, "Databases 
by Subject," www.libraries.iub.edu/index.php?pageld=1697& 
subjectld=100&mode=subjectld (accessed Apr. 5, 2007). 

51. Tyler, Boudreau, and Leach, "The Communication Studies 
Researcher and the Communication Studies Indexes," 35. 

52. Schaffer, "Psychology Citations Revisited," 354. 



18 



LRTS 52(1) 



Trudi Bellardo Hahn (thahn@umd.edu) is 
a visiting professor, College of Information 
Studies, University of Maryland, College 
Park. 



This article is based on a presentation 
given at "Converting and Preserving 
the Scholorly Record," Eighth Annudl 
Symposium on Scholarly Communica- 
tion, State University of New York, 
Albany, October 24, 2006. 

Submitted February 12, 2007; tentatively 
accepted pending revision March 31, 
2007; revised and resubmitted April 7, 
2007, and accepted for publication. 



Mass Digitization 

Implications for Preserving 
the Scholarly Record 

By Trudi Bellardo Hahn 



Libraries and archives have a critical role in preserving the scholarly record; 
many players in the publication cycle depend on them for this. Preservation of 
scholarly books that are being digitized has lagged far behind preservation initia- 
tives for electronic journals. The issue has become more critical, as large commer- 
cial companies such as Google, Yahoo, and Microsoft have begun mass digitization 
of millions of books in research libraries. Since December 2004, the pace of devel- 
opments has been rapid, involving great risks on Google's part over the copyright 
issue. Google and certain participating libraries have not addressed the issue of 
whether or not all this effort to digitize huge numbers of books indiscriminately 
will serve students' and scholars' needs in the long run. Quality, secrecy, and 
long-term stability are all issues that suggest it may be foolish to expect that 
commercial companies will share librarians' values and commitment to digitized 
material preservation. The information profession must exert strong leadership 
in setting policies, standards, and best practices for long-term preservation of the 
scholarly record. 

Libraries and archives that serve the scholarly community have a solemn 
responsibility to preserve the scholarly record. What these institutions do 
(or fail to do) will have an impact on all the players in the arena of what has 
been called the "publication cycle." The players in the cycle include publishers, 
editors, reviewers, librarians, archivists, readers, and, of course, scholars them- 
selves. Converting and preserving scholarly materials are generally seen as the 
last steps in the cycle — if the cycle can be said to have an end. The experts in 
converting scholarly materials from paper or other tangible materials to digital 
formats, and in preserving those digitized documents, are only a small subset of 
this large community, and I am not one of those technical experts. Nonetheless, I 
am among the stakeholders in the community affected. My observations are from 
the perspective of an informed, objective, and concerned eyewitness to current 
developments. 

Side-stepping the issues and developments surrounding digitized and born- 
digital journals, I will focus on the programs for mass digitizing of books and other, 
nonjournal scholarly materials by such companies as Google, Yahoo, Microsoft, 
and others. Interestingly, in the past, these companies were never considered 
part of the scholarly publication community — a fact that makes their abrupt and 
explosive entrance onto the scene not only unexpected, but also unsettling. 

My issues and concerns are organized into five areas: pace of developments, 
foolish risk versus vision, justification for digitizing books, trust, and leadership. 
Each of these has implications for preservation and long-term access to digital 
documents. Preservation and access go hand-in-glove, but they are not the same. 
Most of my observations focus on Google; they are by far the biggest and most 



52(1) LRTS 



Mass Digitization 19 



controversial player — the eight-himdred-pound gorilla. I 
will mention activities of Yahoo and Microsoft as well; the 
implications for preservation are similar. 

Pace of Developments 

Is this all happening too fast? Digitizing library books and 
making scholarly collections available on the Web have 
been around for more than a decade. Since the commer- 
cial world, in the form of such deep-pocket companies as 
Google, Yahoo, and Microsoft, has come into the academy, 
the pace has sped up enormously. Is the pace too fast to 
make good policy? Is it too fast to ponder and debate dif- 
ficult issues and make decisions that will benefit all of us in 
the long term? 

Some of us are still digesting Google's startling 
announcement in December 2004 that it will be working 
with five major research libraries to digitize more than 
fifteen million books from their collections in exchange for 
providing these libraries with digital copies of their books. 
Google will load the copies into their own digital library and 
make full-text versions available if they are in the public 
domain, or brief excerpts — snippets — if they are still under 
copyright protection. The project promises to cost millions 
of dollars — perhaps as much as a billion — and to take six to 
ten years to complete. Previously, the libraries involved had 
thought that such a digitization project would take far lon- 
ger — when library staff at the University of Michigan were 
asked in 2004 how long it would take to digitize Michigan's 
seven million volumes, "the answer was more than a 1,000 
years." 1 The project was — and still is — staggering in its 
speed and daring. 

In the two years since that announcement, a rapid 
succession of announcements from Google and other orga- 
nizations astonished us with the scope and potential for 
enormous impact: 

• The Seattle Times reported on October 3, 2005, 
that Yahoo and Microsoft would team up with the 
nonprofit Internet Archive as well as several other 
large research libraries and archives in establish- 
ing the Open Content Alliance (OCA) to pool the 
collections of a large number of research libraries. 2 
According to the article, OCA will only digitize those 
materials in the public domain, including handwritten 
manuscripts, unless copyright holders give explicit 
permission to digitize. The complete books will be 
freely available in a permanent archive. Funding and 
support will come from Yahoo and Microsoft as well as 
from participating libraries and other companies, such 
as Hewlett-Packard Labs, LibriVox, Octavo, Lulu, 
com, and Adobe. It appears that everybody wants to 



get in on the act! That Yahoo and Microsoft jumped 
on this bandwagon is not surprising. Yahoo is Google's 
archrival, and Microsoft's share of the searching mar- 
ket is growing all the time — it is, after all, "the default 
search engine built into the default Web browser 
available right out of the computer box." 3 

• The Financial Times (London) announced on 
November 4, 2005, that Microsoft is investing in a 
digitization project of 100,000 books from the British 
Library. 4 

• James H. Billington, Librarian of Congress, wrote 
an article for the Washington Post on November 22, 
2005, about the Library of Congress (LC) receiving 
$3 million from Google to jump-start their digital 
archive of international cultural artifacts, the World 
Digital Library. 5 

• On March 7, 2006, the Australian reported that the 
European Commission (EC) plans to make at least six 
million books, documents, and other cultural works 
available by 2010. The EC will contribute $72 million 
to the digital library, and expects member states to 
make up the remaining $250 to $300 million to com- 
plete the project. Its goal is to combat other digitiza- 
tion projects that have an Anglo-American-centric 
view of history. The EC says it is not going after 
copyrighted works, but does not reveal details of the 
program, which has publishers worried that it might 
affect their own digital preservation programs. 6 

• The Boston Herald reported on June 15, 2006, that 
simultaneously with the Shakespeare in the Park 
festival in New York City, Google is launching a Web 
site that allows users to search all of Shakespeare's 
plays and poems — which, of course, are in the public 
domain. ' 

• A New York Times article on August 9, 2006, reported 
that the University of California would join the 
Google project, adding millions of books from the 
system's one hundred libraries. The digitization pro- 
gram will include copyrighted works. Google is talk- 
ing to other libraries as well. 8 

• According to the Milwaukee Journal Sentinel, October 
13, 2006, the University of Wisconsin has jumped on 
the Google bandwagon. 9 

• A few days later, the Financial Times (London) 
reported that Microsoft has a new partnership to scan 
books from Cornell University's library. Microsoft 
already has partnerships with the British Library and 
other library members of the OCA. 10 

As Van Orsdel and Born observed, perhaps the good 
news in all of these announcements is that book digitization 
projects have taken over the spotlight in the past two years 
and upstaged the serials crisis. 11 



20 Hahn 



Foolish Risk ... or Vision? 

We should be grateful to Google for sticking out its neck — 
for pushing the envelope on technological innovations, 
copyright, and other important aspects of digitization. At a 
symposium at the University of Michigan in March 2006, 
Google's Adam Smith said we need to "just do it" and "not 
let perfection be the enemy of the good," and that we need 
to "get it out there" — learn from mistakes, iterate the pro- 
cess, and make it better. 12 

On the other hand, Yahoo, Microsoft, the Library of 
Congress, and the OCA are staying in the background, 
which may be a good place to be. It not only is safer, but 
their smaller programs permit experimentation and policy 
setting to be done with older materials and those materials 
not under copyright. They are learning a lot, and introduc- 
ing technological innovations with projects that do not 
risk lawsuits. 

In regard to copyright, Google, publishers, and the par- 
ticipating universities all agree on the fundamental issue that 
intellectual property laws should be respected. Nonetheless, 
Google is being subjected to numerous lawsuits in the 
United States, France, Germany, and elsewhere because 
some publishers disagree on whether Google is infringing on 
copyright. Google is taking an extremely aggressive stance 
on copyrighted materials, insisting on an opt-out model that 
requires authors and publishers to contact Google and tell 
them they do not want their books included. Google says 
that opt-out is much easier, cheaper, and quicker than opt-in 
because Google would have to contact millions of copyright 
holders before even deciding which books they could digi- 
tize. Further, a large percentage of copyright holders would 
be virtually impossible to reach. This is one manifestation of 
the orphan works (copyrighted works whose owners may be 
impossible to identify and locate) problem. 13 

Authors and publishers say that Google is looking at this 
only from Google's perspective. What if a lot of companies 
and organizations — not just Google — get into large-scale 
digitization? That would put a big burden on the copyright 
holders who want to opt-out but might not even be aware 
that their books are being digitized. 

Justification for Digitizing Books 

Electronic journals and digitized versions of older print 
journals have become firmly established in research librar- 
ies' collections. Why digitize books as well? Twenty-first- 
century scholars are increasingly bypassing books — looking 
for background information in print library collections may 
slow down the scholar who wants to be productive. Even 
scholars in the humanities and social sciences are looking 
to their colleagues in the sciences, modeling their behavior 



LRTS 52(1) 



after them because all scholars want to save time and be 
more productive. Initially, historians were hostile to JSTOR 
(a trusted archive of scholarly journals), but now most find 
it extremely helpful in their research. 

College students use books and journals (at least if 
they have been trained to do so; otherwise, they simply 
use a search engine to find information on Web sites). For 
both students and scholars, however, the book is becoming 
increasingly irrelevant for learning and discovery. 

Looking back a few centuries provides a perspective on 
how the pace of change is forcing us radically and rapidly 
to rethink our assumptions about scholarship. The transi- 
tion from an oral to a written culture developed over many 
centuries. As Bengston said, 

During this slow evolution, our way of think- 
ing fundamentally changed, from repetitive, oral, 
memory-based knowledge to visual and spatial 
memory, based on the physical object of the book. 
For centuries books were simply the most efficient 
and usable technology for the transmission of cul- 
ture and ideas. We need only reflect on the past 
few years to sense how quickly and radically the 
ways that we write and communicate have been 
and will be altered. 14 

What do modern scholars and students really want 
or need? Have we factored their rapidly changing needs, 
preferences, and habits into our preservation programs? 
Predicting what, exactly, will happen to print books or even 
e-books in this century and beyond is impossible. Many 
people are confident that certain kinds of books "will cease 
to exist on paper: directories, reference works, textbooks, 
travel guides, to name a few." 15 No one can say, however, 
how much scholars and students will care about linear, nar- 
rative, book-length treatments. We do not know how much 
generations to come will care about preserving words, 
compared to visual and multimedia documents or even raw 
data. The only thing we may be certain of is there will be a 
tidal wave of interest in networked, digital media. Are our 
preservation programs responding to those trends? 

For some time now, libraries have paid attention to 
cooperating in digitization projects that focus on unique 
collections as a cost-efficient way to give scholars all over 
the world access to rich resources and to preserve those 
valuable print materials that were deteriorating. Just a 
little more than two years ago, Brian Lavoie and Lorcan 
Dempsey at OCLC Research admonished libraries to be 
very careful and wise in allocating their insufficient bud- 
gets for preservation. 16 Accordingly, OCA members are 
carefully selecting which materials to contribute. Google's 
general approach, on the other hand, has been to throw a 
lot of money at the problem and grab as many books as they 



52(1) LRTS 



Mass Digitization 21 



can without selecting particular parts of collections. Daniel 
Greenstein, director of the California Digital Library, was 
quoted in January 2006 as saying that his discussions with 
Google officials disclosed that they are "more interested 
in grabbing a large quantity of materials than in carefully 
selecting certain collections of works." 17 Their attitude is 
simply, "the more of it, the better." 18 This approach is not 
true at all universities participating in the Google project, 
but it is a general strategy. 

The question is whether a selective collection policy 
for digitization and preservation is better than a scattershot 
approach. It appears inevitable that the gems of our collec- 
tions are not going to be exploited unless we digitize them. 
But should we be aiming for digitizing everything as fast as 
possible, so that we can provide what Mary Ann Coleman, 
president of the University of Michigan, referred to as 
"instant gratification of a one-in-a-million need," or should 
we be taking a more measured approach that addresses the 
most likely and important needs now and in the future? 19 1 
propose that we need to think more carefully about preser- 
vation priorities and match them to the norms of twenty- 
first-century scholarship. We need to spend our scarce 
resources for those digitization activities that not only will 
increase access, but will serve our long-term preservation 
goals as well. 

Trust 

At a symposium titled "Scholarship and Libraries 
in Transition: A Dialogue about the Impacts of Mass 
Digitization Projects," held at the University of Michigan on 
March 10-11, 2006, Clifford Lynch, executive director of 
the Coalition for Networked Information, was the wrap-up 
speaker. 20 He proposed that digitization is a form of insur- 
ance — in fact, one of the best forms of insurance we have. 
He said it is not a replacement for the physical object, but 
increasingly a good (albeit not perfect) surrogate. But is it 
really? If a foreign army came marching through your town, 
would you be preserving documents by tossing them into 
the hayloft? They would be out of the way of the marauders, 
yes, but they still would be subject to thieves stealing them, 
mice nibbling away at them, and rain leaking through the 
roof. Preservation is much more than finding a compact, 
convenient, and inexpensive place to stash materials. 

The academy has enduring values and standards of 
preservation. Every academic library has in its mission state- 
ment something about archiving, conserving, or preserving 
the scholarly record for perpetual access. For example, 
on the University of Maryland Libraries' Web site is their 
mission statement: "Providing access to the use of the 
scholarly information resources required to meet the edu- 
cation, research and service missions of the University. The 



Libraries support this effort by building, organizing, main- 
taining and preserving these resources." 21 

Another way to express this statement of values comes 
from Coleman. In a speech to the Association of American 
Publishers in February 2006, she said, "General Motors 
does not need to maintain the tools for its 1957 Chevys, and 
would have a hard time manufacturing a car from that year. 
But a university is responsible for stewarding the knowledge 
of 1957, and for all the years before and after — the books 
and magazines; the widely known research findings and the 
narrow monographs; the arcane and the popular." 22 

Given that academic libraries accept a staggering respon- 
sibility with limited resources to meet that responsibility, 
they need to win people's trust that they will fulfill their 
mission to preserve evanescent digital materials. 23 They also 
need Google and other commercial enterprises as valued 
allies and partners. Karen Wittenberg, director of Columbia 
University's Electronic Publishing Initiative (EPIC), affirmed 
libraries' dependence on the for-profits — "We need to face 
the fact that commercial search engines are now the mecha- 
nism of choice for finding information, and we desperately 
need Google and other powerful players as valued partners 
with whom we will negotiate effective ways of collaborating 
that benefit our businesses and our users." 24 

Nevertheless, librarians and archivists need to be care- 
ful. They may get chummy with the staff of the for-profits — 
their employees are awfully friendly people, and how can 
you dislike a company such as Google, whose official motto 
is "Don't be evil?" 25 But we should never think of Google, 
Microsoft, or Yahoo as one of us. Google has a corporate 
mission "to organize the world's information," but it is for 
the goal of building and sustaining a massive and highly 
profitable media empire. 26 Google may sincerely believe 
that it operates according to higher principles, but some of 
its recent actions, such as its decision to abide by political 
restrictions placed on it by the Chinese government, prove 
that it is willing to compromise its principles in order to stay 
competitive. 2 ' A lot of money is to be made in digitizing 
books — when the content moves from physical to digital, its 
value jumps enormously. When there is money to be made, 
libraries and archives should be vigilant and alert. Too much 
chumminess with commercial enterprises raises three basic 
problems or issues related to trust: quality, secrecy, and 
long-term stability. 

Quality 

Quality is a serious issue in preservation that involves poor 
optical character recognition (OCR), poor originals that 
result in poor reproductions, missing pages, truncated text, 
and damage to the materials being digitized. 

Is mass digitization preservation? Yes or no? Apparently 
no consensus exists, even among the representatives of the 



22 Hahn 



LRTS 52(1) 



Google 5 (as of January 2007, the Google 7 and expanding). 
Dale Flecker, associate director for Systems and Planning at 
the Harvard University Library, insists that the Google proj- 
ect is not planned as a preservation project; mass digitization 
is really only about providing access. 28 The attitude of the 
University of Michigan's administrators, however, is more 
complex. On the one hand, they acknowledge the serious- 
ness of the preservation problem. For example, Coleman 
reported that Michigan was one of nearly 3,400 institutions 
that took part in the massive Heritage Health Index, which 
assessed how well our cultural institutions are tending to 
some 4.8 billion artifacts — the majority of which are books 
held at libraries. 29 Coleman said that the findings that came 
out in December 2005 were discouraging, and she warned, 
"As a country, we are at risk of losing millions and mil- 
lions of items that constitute our heritage and our culture, 
because of a lack of conservation and planning. ... So con- 
servation efforts are paramount." 30 Michigan's response has 
been to create digital copies of works that are at-risk, out 
of print, or languishing in warehouses, an effort speeded 
up enormously because of the Google program. John Price 
Wilkin, University of Michigan associate university librarian, 
affirmed that Michigan thinks of it as a preservation proj- 
ect. 31 Even at Michigan, however, preservation and conser- 
vation staffs handle delicate materials that they feel are too 
fragile to scan. Michigan is not trusting the mass digitization 
program to protect their most vulnerable and valuable mate- 
rials because they know that it does sacrifice quality. 

Other Google partners have conceded quietly that the 
overall quality of the scans has not been great. Andrew 
Herkovic of Stanford University Library was quoted in an 
article by Helm in Business Week Online saying "Google 
has never pretended to knuckle under to quality demands 
that [preservationists] hope for." 32 In the same article, 
Sidney Verba, director of the Harvard University Library, 
said, "We at Harvard do a more careful and high-quality 
digitization when we do it for our own purposes, there's no 
question." 33 There is a question, however, whether Harvard 
is duplicating Google's digitization efforts. We also should 
ask why Google is not adhering to preservation standards 
when scanning. 

Most of the institutions participating in the Google 
project concede that the main benefit of this project is not 
preservation, it is access — especially to students and scholars 
who would never otherwise be aware of the content of these 
books. If the quality is not good enough to read online, the 
hope is that the users will go to the library and find the origi- 
nal book. Given what we know about twenty-first-century 
behaviors among students and scholars, however, do we 
believe that very many of them will seek information beyond 
what they can find on their desktops? Deanna Marcum, 
associate librarian for library services at the LC, eloquently 
portrayed a scenario of the college student who, working 



from a "cozy, computer-equipped dorm room" can ignore 
the library completely and write a term paper — albeit with 
some questionable resources — entirely based on resources 
found online through a commercial search service such 
as Google. 34 

In any case, the price is right — it is pretty much free. 
Herkovic at Stanford was quoted in Helm's article, "If 
we were paying for this, if we were driving the [quality 
specifications] , they would be different from what Google is 
offering." 35 Adam Smith, product manager of Google Books, 
responded in the same article by saying that "the primary 
goal right now is to put as much content online as possible, 
and address problems later." 36 

A news item in September 2006 reported that Google 
is turning to the greater engineering community for help 
improving the OCR technology it needs to index and archive 
books. 37 The technology Google is currently using is highly 
accurate at reading Latin characters, but still has trouble 
with other languages, handwriting, highly stylized fonts, 
smudged print, scientific treatises, and unique layouts. 
Google also has had problems with blurry or off-center 
scans that can confuse OCR engines and prevent the deci- 
phering of a document's letters and words. Those pages 
would, therefore, not be indexed. At least Google is admit- 
ting that it has a problem, and one hopes that improvements 
in OCR and scanning will benefit all of us. 

In the meantime, it appears that at least some materi- 
als being scanned will have to be scanned a second time — a 
waste of precious resources. Some researchers have suggest- 
ed that preservation could end up costing much more than 
the original digitization of the books. If Google, Yahoo, or 
Microsoft have answers to the tough questions surrounding 
preservation of digital documents, they have not announced 
them or published them yet. It seems safe to assume that 
libraries and archives must accept that the responsibility for 
preservation is still theirs. It is, therefore, vital for all of us 
to know what the libraries participating in mass digitization 
intend to do. 

Secrecy 

Google in particular has been secretive — even cultivating 
an aura of mystery — about such things as their own high- 
speed book scanner. They refuse to divulge details of how 
it works or how fast it scans books. Google also does not say 
how many books it has scanned so far, or which books have 
been scanned. 

Long-term Stability 

Did anyone see a headline in a recent Wall Street Journal 
that read "Google Files for Bankruptcy?" The news item 
continued: 



52(1) LRTS 



Mass Digitization 23 



Under the weight of too many lawsuits, rapid 
overextending of services (now more than twenty- 
nine different services), and mismanagement of 
its staggering empire, Google today filed for bank- 
ruptcy protection while it continues its operations. 
The chief executive of Google, who was recently 
appointed to the board of Apple Computer said, 
"This regrettable action became necessary only 
recently when good faith efforts to resolve out- 
standing debt with a creditor from the company's 
earliest days broke down." A spokesperson for 
Google declined to name the creditor. 

Did anyone see that shocking announcement? No — of 
course not — I made it up, and it is nonsense. Google has 
been for some time the number one search engine in the 
United States and Europe, and probably everywhere else 
in the world. Its market share is well ahead of Microsoft, 
Yahoo, and Ask.com. It has amassed nearly $10 billion 
in cash. 

However, a real news story appearing February 20, 
2006, in the Edge Singapore revealed that Google shares 
had dropped nearly 25 percent as the company has grappled 
with growing competition from Microsoft and Yahoo, and 
"there could be a lot more tumbling ahead" because the 
stock prices do not reflect what the company is worth. 38 
Google was facing increased pricing pressures on its online 
ad sales and mounting concern about what is known as click 
fraud as well as other challenges, such as lawsuits from 
newspaper and book publishers. It is not out of the realm 
of possibility that Google could shrink, redirect its mission, 
or even disappear altogether in the coming decades. These 
were the fates of other giants of American industry, such as 
Chrysler, IBM, and AT&T. 

Another real headline appearing on October 6, 2006, in 
the Washington Post read, "Google Seeks Info from Book 
Scanners." 39 According to the news item, "Google Inc. has 
issued subpoenas for detailed information about its rivals' 
book-scanning projects as part of its defense against lawsuits 
attacking its own plans to put the contents of entire librar- 
ies online." 40 The article noted that the subpoenas were 
sent to Yahoo Inc, Microsoft Corporation, the Association 
of American Publishers, HarperCollins Publishers Inc., 
Bertelsmann AG's Bandom House Inc., and Holtzbrinck 
Publishers LLC. A similar request was also sent to Amazon, 
com Inc. The subpoenas included a request for "documents 
detailing every book the companies have made available 
online or plan to by the end of 2009" — the details are to 
include "lists of all authors, publishers, copyright holders 
and copyright status of each book scanned" as well as "all 
contracts or communications with publishers, copyright 
holders and libraries." 41 Does this tell us that Google is a 
little nervous? As of this writing, the targets of the subpoe- 



nas had refused to cooperate with Google; apparently, they 
feel the request for information is an attempt to capture 
trade secrets. 42 What would happen to mass digitization 
projects in research libraries if Google did collapse? Or if 
its stockholders decided that book search was a money loser 
and should be discontinued? 

Some promising developments are appearing in the 
area of electronic journal preservation. Portico, developed 
by JSTOB and its partners, takes the "trusted third-party" 
approach. LOCKSS (Lots of Copies Keep Stuff Safe) dis- 
tributes the task of preservation through local caching of 
subscriptions. 43 One step further is CLOCKSS (Controlled 
LOCKSS), a not-for-profit network of institutions, including 
OCLC, LC, other research libraries, as well as many pub- 
lishers and learned societies. 44 The mission of CLOCKSS is 
to develop "a distributed, validated, comprehensive archive 
that preserves and ensures continuing access to electronic 
scholarly content." 45 Stemper and Barribeau provide a com- 
prehensive review of all the efforts being made to preserve 
electronic journals. 46 These initiatives, however, while offer- 
ing assurances that e-journals will be accessible far into the 
future, have not yet addressed the problem of preserving 
digital books. We know that libraries and archives have an 
avowed firm commitment to long-term preservation of and 
access to materials. As long as the major funders of our digi- 
tization efforts are commercial enterprises, however, can we 
count on sustainable access over the long term? 

Leadership 

A June 2004 report from the Association of Besearch 
Libraries endorsed digitization as an "accepted preserva- 
tion reformatting option for a range of materials." 47 The 
report conceded that "ensuring high-quality image capture 
and providing for the long-term viability of digital objects 
is an admitted challenge." 48 The information professions, 
however, must take the leadership in developing standards 
and best practices, including developing "strategies to keep 
master files safe for the short-term, [which includes] the use 
of high-quality and reliable storage media, multiple back-up 
systems, periodic testing, and a schedule to refresh data." 49 
These short-term strategies will at least keep the materials 
safe — safer than in the hayloft — while long-term solutions 
are being developed. This is the proper leadership role for 
librarians and archivists. Are we up to it, or will we let the 
eight-hundred-pound-gorilla companies drive the agenda 
and set the priorities? More specifically, are we even con- 
cerned that the gorillas are not dealing with preservation? 
A search in Lexis-Nexis for articles in the general and busi- 
ness news sections uncovered hundreds of hits on the topic 
of "(Google OB Yahoo) AND digitization AND books." 
However, as soon as the word "preservation" was introduced 



24 Hahn 



LRTS 52(1) 



into the search string, the count dropped to zero. Shouldn't 
that worry us? 

We need to stop being reactive; we need to go after the 
preservation target in a strategic way. We own this problem 
of preserving books ... at least for now. Ironically, many 
librarians were unhappy with a 2005 OCLC report because 
one of their key findings was that in the public's eye, the 
library brand is books. 50 That finding is troubling if people 
think of libraries as only about books. But we will be in 
much more trouble if our users stop thinking even that. Do 
you want a book? Go to Google or Yahoo or Amazon.com. 
Where will libraries be then? No brand recognition at all! 

A news item on October 24, 2006, reported that 
Google, in partnership with the Frankfurt Book Fair literacy 
campaign and UNESCO's Institute for Lifelong Learning, 
is launching an online portal to connect literacy organiza- 
tions. 51 In addition to allowing "organisations, teachers and 
others with an interest in literacy to search online for and 
share literacy information," the tool provides a zoomable, 
searchable map that enables users to locate literacy organi- 
zations around the world. 52 Searchers could find information 
in academic articles and digitized books, and share the infor- 
mation they find via groups, videos and blogs. 

This is leadership on a scale that only a huge organiza- 
tion with extraordinarily deep pockets, a focused mission, 
and amazingly creative ideas can hope to mount. We have 
to applaud Google for this leadership, but are we also a little 
jealous, or worried? Perhaps we should be, if not the former, 
at least the latter. 



Summary Thoughts 

I have raised many questions without supplying answers. 
Why? Because this is all happening so fast. Besearch librar- 
ies with a mission to preserve collections and make them 
accessible to future generations will be affected, but we do 
not know exactly how yet. My cautions in each of the five 
areas are: 

• Pace. We cannot slow it down; the pace car is Google, 
and other commercial drivers are nearly as pushy. But 
we must find the time to digest it all before making 
irreversible decisions about our precious collections. 

• Risks. Let Google and others take risks if they wish, 
but we should not be taking risks with our collections, 
nor should we be risking our users' free access to our 
collections in the future. 

• Justification for digitizing books. Given that we all 
have to get on this bandwagon, and given that we all 
have limited resources to do so, we should be think- 
ing of ways to maximize value for scholars now and in 
the future. What sorts of materials will be of the most 



interest into the far-distant future? Can we afford to 
digitize older materials that are the "Long Tail" of 
our collections — items that will appeal only to a small 
number of researchers, or only one, or maybe none 
at all? 53 

• Trust. We must find partners — sister libraries, com- 
mercial entities, government agencies, and others. 
But we must keep a clear head about which of those 
organizations will be around for the long term, which 
of them share our values and our mission, and which 
truly understand preservation and conservation issues 
in regard to fragile, valuable, endangered, and irre- 
placeable artifacts. The Open Content Alliance model 
keeps a lot of control under the contributing libraries, 
where it should be. 

• Leadership. The leadership needs to come from all 
parties in this endeavor. Because we are all mutu- 
ally dependent, no one organization is in a position 
to dictate the discussions or the outcomes. Google 
and Yahoo need our content. We need stable, robust 
technology platforms for preservation and wider use 
of our collections. Scholars and students need more 
access and knowledge about how to use these collec- 
tions. We all need to stay in close communication and 
collaboration . . . with our eyes wide open! In the end, 
research libraries alone will be held accountable for 
fulfilling that vital preservation mission. 

References and Notes 

1. Katie Hafner, "At Harvard, a Man, a Plan and a Scanner," 
The New York Times, Nov. 21, 2005, Section C; Column 
2, Business/Financial Desk, vAvw.nytimes.com/2005/ll/21/ 
business/21harvard.html?ex=1290229200&en=86f7d416af 
4055ed&ei=5090&partner=rssuserland&eme=rss (accessed 
Mar. 31, 2007). 

2. Open Content Alliance, www.opencontentalliance.org 
(accessed Mar. 31, 2007). 

3. Siva Vaidhyanathan, "A Risky Gamble with Google," Chronicle 
of Higher Education 52, no. 15 (Dec. 2, 2005): B7. 

4. Jon Boone and Maija Palmer, "Microsoft in Deal with British 
Library to Add 100,000 Books to the Internet," Financial 
Times, Nov. 3, 2005, front page, first section, http://search 
.ft.com/ftArticle?queryText=microsoft+british+library&aje 
=true&id=051104001019 (accessed Jan. 29, 2007). 

5. James H. Billington, "A Library for The New World," The 
Washington Post, Nov. 22, 2005, A29, www.washingtonpost 
.eom/wp-dyn/content/artiele/2005/1 1/21/AR2005 1 12 10 1234 
_pf.html (accessed Jan. 29, 2007). 

6. "E-Library for Europe," The Australian, Mar. 7, 2006, IT 
Broadsheet, 2. 

7. Google Book Search, "The Complete Plays of Shakespeare 
Now at Your Fingertips," www.google.com/shakespeare 
(accessed Mar. 31, 2007); Jesse Noyes, "Shakespeare Has 
Google Web Site," The Boston Herald, June 15, 2006, 
Finance, 43. 



52(1) LRTS 



Mass Digitization 25 



8. Motoko Rich, "Arts, Briefly; Google Snags another Library," 
The New York Times, Aug. 9, 2006, Section E, Column 5, 
Page 2, The Arts/Cultural Desk. 

9. Megan Twohey, "UW Deal Will Put Books Online," Milwaukee 
Journal Sentinel, Oct. 13, 2006, www.findarticles.eom/p/ 
articles/mi_qn4196/is_20061013/ai_nl678680 1 ( accessed 
Mar. 31, 2007). 

10. Rebecca Knight, "Microsoft in Digital Book Deal," Financial 
Times, Oct. 18, 2006, http://search.ft.com/ftArticle?queryText= 
microsoft+cornell+scan+books&aje=true&id=061018001258 
(accessed Jan. 29, 2007). 

11. Lee C. Van Orsdel and Kathleen Bom, "Journals in the 
Time of Google," LibraryJournal.com (Apr. 15, 2006), www 
.libraryjournal.com/article/CA6321722.html (accessed Mar. 
31, 2007); Adam Smith, "Google's Perspective" (informal 
presentation at "Scholarship and Libraries in Transition: A 
Dialogue about the Impacts of Mass Digitization Projects," 
University of Michigan, Ann Arbor, Mar. 10-11, 2006). 

12. U.S. National Commission on Libraries and Information 
Science, Mass Digitization: Implications for Information 
Policy: Report from "Scholarship and Libraries in Transition: 
A Dialogue about the Impacts of Mass Digitization Projects, " 
Symposium held on March 10-11, 2006 at the University of 
Michigan, Ann Arbor, MI. May 9, 2006 (Washington D.C.: 
NCLIS, 2006), 11. 

13. Ibid. 

14. Jonathan B . Bengtson, "The Birth of the Universal Library," Lib- 
rary Journal Net Connect (Apr. 15, 2006), www.libraryjoumal 
.com/article/CA6322017.html (accessed Mar. 31, 2007). 

15. Andrew Richard Albanese, "The Social Life of Books," 
Library Journal (May 15, 2006): 28-30. 

16. Brian Lavoie and Lorcan Dempsey, "Thirteen Ways of 
Looking at . . . Digital Preservation," D-Lib Magazine 10, 
no. 7/8 (2004), www.dlib.org/dlib/july04/lavoie/071avoie.html 
(accessed Jan. 29, 2007). 

17. Jeffrey R. Young, "Scribes of the Digital Era," The Chronicle 
of Higher Education 52, no. 21 (Jan. 27, 2006): 34. 

18. Ibid. 

19. Mary Sue Coleman, "Google, the Khmer Rouge, and 
the Public Good" (address to the Professional/Scholarly 
Publishing Division of the Association of American Publishers, 
Washington, D.C., Feb. 6, 2006), www.umich.edu/pres/ 
speeches/060206google.html (accessed Jan. 29, 2007). 

20. Clifford Lynch, "Web Cast: Closing Remarks," delivered at 
"Scholarship and Libraries in Transition: A Dialogue about 
the Impacts of Mass Digitization Projects," Mar. 11, 2006, 
www.lib.umich.edu/mdp/symposium/lynch.html (accessed 
Mar. 31, 2007). 

21. University of Maryland Libraries. "Libraries Mission," www 
.lib.umd.edu/deans/index.html (accessed Mar. 31, 2007). 

22. Coleman, "Google, The Khmer Rouge, and the Public 
Good." 

23. Peter Hart and Ziming Liu, "Trust in the Preservation of 
Digital Information," Communication of the ACM 46, no. 6 
(2003): 93-97. 

24. Kate Wittenberg, "Beyond Google: What Next for Publishing?" 
The Chronicle of Higher Education 52, no. 41 (June 16, 
2006): 20. 



25. Google, "Google Code of Conduct," http://investor. google 
.com/conduct.html (accessed Mar. 31, 2007). 

26. Google, "Google Corporate Information, Company Overview," 
www.google.com/corporate (accessed Mar. 31, 2007). 

27. This issue is discussed in articles in The Independent (London), 
July 20, 2006; Irish Times, July 20, 2006; and San Francisco 
Chronicle, July 21, 2006. 

28. "Google Print for Libraries: The Bold and the Cautious," 
LibraryJournal.com, www.libraryjournal.com/article/CA22558 
.html (accessed Oct. 22, 2006). 

29. Coleman, "Google, the Khmer Rouge and the Public Good"; 
A Public Trust at Risk: The Heritage Health Index Report on 
the State of America's Collections (Washington, D.C.: Heritage 
Preservation — The National Institute for Conservation, 2005), 
www.heritagepreservation.org/hhi/full.html (accessed Mar. 
31, 2007). 

30. Coleman, "Google, the Khmer Rouge and the Public Good." 

31. U.S. National Commission on Libraries and Information 
Science, Mass Digitization. 

32. Burt Helm, "Google's Great Works in Progress," Rusiness 
Week Online, Dec. 22, 2005, www.businessweek.com/ 
technology/content/dec2005/tc20051222_636880.htm 
(accessed Oct. 21, 2006). 

33. Ibid. 

34. Deanna B. Marcum, "The Future of Cataloging," Library 
Resources irTechnical Services 50, no. 1 (2006): 6. 

35. Helm, "Google's Great Works in Progress." 

36. Ibid. 

37. Catherine Holahan, "Google Seeks Help with Recognition," 
Rusiness Week Online, Sept. 7, 2006, www.businessweek 
.com/technology/content/sep2006/tc20060907_732714 
.htm?ehan=top+news_top+news+index_teehnology (accessed 
Oct. 21, 2006). 

38. Jacqueline Doherty, "Barron's: Google Trouble?" The Edge 
Daily, Feb. 21, 2006, www.theedgedaily.com/cms/content 
.jsp?id=com.tms. cms. article. Article_8b5a2df4-cb73c03a 
-23d27500-d95569df (accessed Jan. 29, 2007). 

39. Jessica Mintz, "Google Seeks Info from Book Scanners," 
Washington Post, Oct. 6, 2006. 

40. Ibid. 

41. Ibid. 

42. Keith Regan, "Yahoo Snubs Google in Digital Book Copyright 
Case," E-Commerce Times, Nov. 30, 2006, wwwecommerce 
times.com/story/54494.html (accessed Oct. 21, 2006). 

43. LOCKSS, www.lockss.org/lockss/Home (accessed Mar. 31, 
2007). 

44. CLOCKSS, www.lockss.org/clockss/Home (accessed Jan. 29, 
2007). 

45. Ibid. 

46. Jim Stemper and Susan Barribeau, "Perpetual Access to 
Electronic Journals: A Survey of One Academic Research 
Library's Licenses," Library Resources & Technical Services 
50, no. 2 (2006): 91-109. 

47. Kathleen Arthur et al., Recognizing Digitization As a Preser- 
vation Reformatting Method (Washington D.C.: Association 
of Research Libraries, 2004), 2. 

48. Ibid., 3. 

49. Ibid., 4. 



26 Hahn 



IMS 52(1) 



50. OCLC, Perceptions of Libraries and Information Resources: 53. 
A Report to the Membership (Dublin, Ohio: OCLC, 2005), 
www.oclc.org/reports/2005perceptions.htm (accessed Jan. 29, 
2007). 

51. Ibid. 

52. "Google Launches Online Literacy Project," Business and 
Industry, Oct. 4, 2006 (accessed in Lexis-Nexis [proprietary 
database] Oct. 21, 2006). 



Chris Anderson, "The Long Tail," Wired Magazine 12, no. 
10 (Oct. 2004), http://web.archive.org/web/20041127085645/ 
http://www.wired.com/wired/archive/12.10/tail.html2004 
(accessed Mar. 31, 2006). 



Statement of Ownership, Management, and Circulation 

Library Resources h- Technical Services, Publication No. 311-960, is published quarterly by the Association for Library Collections & Technical 
Services, American Library Association, 50 E. Huron St., Chicago (Cook), Illinois 60611-2795. The editor is Peggy Johnson, Associate University 
Librarian, University of Minnesota, 499 Wilson Libraiy, 309 19th Ave. South, Minneapolis, MN 55455. Annual subscription price, $75.00. Printed 
in U.S.A. with periodicals-class postage paid at Chicago, Illinois, and at additional mailing offices. As a nonprofit organization authorized to mail at 
special rates (DMM Section 424.12 only), the purpose, function, and nonprofit status of this organization and the exempt status for federal income 
tax purposes have not changed during the preceding twelve months. 

(Average figures denote the average number of copies printed each issue during the preceding twelve months; actual figures denote actual 
number of copies of single issue published nearest to filing date: July 2007 issue.) Total number of copies printed: average, 6,199; actual, 6,185. 
Sales through dealers, carriers, street vendors and counter sales: average, none; actual, .512. Mail subscription: average, 5,014; actual, 4,97.3. Free 
distribution: average, 196; actual, 189. Total distribution: average, 5,670; actual, 5,674. Office use, leftover, unaccounted, spoiled after printing: 
average, 529; actual, 511. Total: average, 6,199; actual, 6,185. Percentage paid: average, 96.54; actual, 96.67. 

Statement of Ownership, Management and Circulation (PS Form 3526, September 2007) for 2006/2007 filed with the United States Post 
Office Postmaster in Chicago, October 1, 2007. 



52(1) LRTS 



27 



Jeffrey L. Horrell (Jeffrey!. Horrell@ 
Dartmouth.edu) is Dean of Libraries 
and Librarian of the College, Dartmouth 
College, Hanover, New Hampshire. 

This article is based on a presentation 
given at "Converting and Preserving 
the Scholarly Record," Eighth Annuel 
Symposium on Scholarly Communication, 
State University of New York, Albany, 
October 24, 2006. 



Submitted February 12, 2007; tentatively 
accepted pending revision April 2, 2007; 
revised and resubmitted June 30 , 2007, 
and decepted for publication. 



Converting and 
Preserving the 
Scholarly Record 

An Overview 

By Jeffrey L. Horrell 

The author provides an overview of the issues related to preservation in the digi- 
tal environment and describes initiatives that promise to address these issues. He 
considers the mutability of electronic content, the mission of libraries to preserve, 
the long-term ownership of digital content, the nature of preservation in a digital 
age, and promising digital preservation initiatives. The paper, drawing on the 
work of a collaborative Duke/Dartmouth Mellon- sponsored project, concludes 
with recommended elements for a campuswide digital repository. 

To set the stage for this article, let me begin by sharing my first experience 
of realizing what it meant to preserve or potentially lose scholarly content. 
Picture a library school (when they were called library schools) student in his 
first semester, working at the reference desk of a major research institution on 
a Sunday evening, when a user appears at the desk with a handful of author 
catalog cards clearly ripped from the public card catalog (again, when there 
were such things), presents them, and quite casually asks, "Where do I find 
these books?" The shock of what I was witnessing was overwhelming to a soon- 
to-be librarian! I quickly composed myself, politely offered a stacks chart, and 
offered to write down the call numbers as I extended my hand to retrieve and 
secure the cards. I can remember the almost reluctant tug of the cards from 
the individual. The transaction ended smoothly, with the cards in my hand 
and the person off to find the materials. At that moment, in a very small, but 
what seemed dramatic way, I felt what the responsibility for preserving access 
to content really meant. No doubt, scholarship as we know it would not have 
come to a standstill without that handful of cards, but the potential loss might, 
indeed, have had an effect. 

Forward several decades later, and ask yourself if you have had the experi- 
ence of searching on the Internet, locating a Web site using Google or Yahoo, and 
returning to search for it again to find that the site has disappeared. Have you 
expected to find particular information on a Web site, but cannot because the site 
has been updated and there is no easily accessible archive? Have you encountered 
a difference between the print and electronic versions of a document and won- 
dered which one is correct? There are countless examples of this "now you see it, 
now you don't" phenomenon. 

Consider Britannica Online. For decades, print editions of the Encyclopaedia 
Britannica have been considered authoritative and reliable, but were out of date 
soon after publication. By comparing one edition with subsequent editions, 
one could see when a topic was introduced or an entry changed. With the elec- 



28 Horrell 



LRTS 52(1) 



tronic version, however, changes are often transparent, and 
information moves in and out of the work without notice. 
Another example is the online version of the journal Nature, 
which several years ago embargoed or delayed publishing 
parts of its content in the electronic version. In some ways, 
this practice could be an effective, strategic marketing 
initiative designed to preserve the print subscription base, 
but nonetheless, it was not clear if and when the content of 
the online version was complete. Finally, I think we can all 
point to redesigning or writing over Web sites in our own 
institutions, if not our libraries, and in the process losing 
content that traditionally had been preserved in print, but 
lost in subsequent electronic versions. 

In the predigital era, an understanding of what actually 
constituted a record was less complicated, and libraries, 
together with records management units, provided pres- 
ervation and archival services for their institutions that, 
in large part, preserved our history and ensured regula- 
tory compliance. With multiple libraries acquiring the 
same titles, redundancy was a safeguard for the materials. 
Volumes and archival records rested neatly on shelves in our 
libraries or in climate-controlled storage facilities. Guthrie, 
former head of JSTOR and now Ithaka, has pointed out 
that costs certainly were associated with preserving the 
print copies beyond simply storage, including maintaining 
the physical environment, shelving, repair, microfilming, 
and replacement in some instances. 1 But with the pace at 
which new forms of digital formats are replacing print, pho- 
tographic film, video, and audio recordings, combined with 
extraordinary amounts of digital data that can be readily 
acquired or licensed, stored, and disseminated, institutions 
and society are challenged to consider ways of maintaining 
archives of digital objects of all descriptions. 

As we think about this challenge, I believe it is important 
to remind ourselves why preserving information is important 
in the first place — and this takes us to our mission. The basic 
elements of the mission of an educational institution include 
researching, teaching, publishing research outcomes, and 
maintaining these results in a record of some sort. Data 
replication is essential and at science's core. Building on 
the scholarly record is central to the humanities and social 
science traditions. The mission statements of our libraries 
reflect the same elements. Dartmouth's is as follows: 

The Dartmouth College Library fosters intel- 
lectual growth and advances the teaching and 
research missions of Dartmouth College by sup- 
porting excellence and innovation in education 
and research, managing and delivering scholarly 
content, and partnering in the development and 
dissemination of new scholarship. 2 



Managing the Scholarly Content 
Means Preserving It 

Waters, program officer of scholarly communications at 
The Andrew W. Mellon Foundation, in a 2006 paper titled, 
"Managing Digital Assets in Higher Education: An Overview 
of Strategic Issues," indicated that dissemination, preserva- 
tion, and access refers to the life cycle of scholarly resources 
that are used and produced in teaching and research, and 
are the objects of scholarly communications. 3 He went on to 
outline a serious and deeply troubling scenario that centers 
on the transition from print to electronic publishing and 
from owning to licensing information. When a library pur- 
chased materials outright, it could do with it as it liked within 
the guidelines of copyright. There were instances of libraries 
giving content to microfilm publishers and then buying back 
the products, and of libraries allowing publishers to convert 
their microfilm content to digital format and then licensing 
it. Now, in some cases, this content is maintained on remote 
systems controlled by publishers. In effect, libraries and 
their institutions have an ongoing mortgage for the content 
that they owned in the first place, in print. No doubt access 
is improved, but something is seriously wrong with this 
model. Waters argued that one could see a business model 
for publishers that offers data-mining services for the large 
aggregation of content that could enable greater opportuni- 
ties for scholarship. In an institution of higher education, 
one could easily imagine preservation at the center of such 
an endeavor, but is there a compelling business interest for it 
in a profit-driven company? The results could be that librar- 
ies will not own or have the rights to the scholarly products, 
nor will they have a true archive of them, and publishers 
could apply whatever pricing model they choose for such 
data-mining services. This presents a scenario where the 
control of the scholarly record moves from the academy to 
the publishing, and mostly for-profit, sector. 

In addition to considering our mission in light of the 
business model just described, we also are faced with new 
questions about the nature of document preservation. What, 
exactly, is preservation? For example, can we say that a doc- 
ument has been preserved if we save the text, but our digital 
systems cannot reproduce its original typeface or style? 4 
Related issues surrounding the context and thinking behind 
manuscripts or policies that were once captured in letters, 
memoranda, drafts, and other ancillary documents need to 
be considered in a world driven by e-mail and instant mes- 
sage. As a society and as educational institutions, we have 
a collective responsibility to preserve and make available, 
along a continuum of a life cycle, our digital heritage, but 
an understanding of what preservation means in the digital 
world is complicated. 



52(1) LRTS 



Converting and Preserving the Scholarly Record 29 



Promising Initiatives 

How do we proceed? Several examples of promising pres- 
ervation initiative are worth noting. One that began several 
years ago and now has more than forty libraries as partners 
is Lots of Copies Keep Stuff Safe (LOCKSS). 5 It also had 
the engagement and support of more than thirty publish- 
ers. The goal of LOCKSS is to provide a low-cost, low-tech 
system of ensuring continued access to journal literature. It 
collects newly published content using a Web crawler simi- 
lar in nature to those used by commercial or other search 
engines. It compares the content it has collected with the 
same content on other distributed computers and repairs or 
reconciles any differences. Earlier this year, a project called 
Controlled LOCKSS (CLOCKSS), which uses the LOCKSS 
methodology, was developed as a dark archive intended to 
serve as a fail-safe repository for this content. 6 The content 
from CLOCKSS would only be used in the event that it was 
no longer available from the publisher. A group represent- 
ing publishers, learned societies, and libraries would be 
responsible for deciding the trigger conditions by which the 
content could, or should, be made available. 

Another important preservation initiative is Portico.' 
Sponsored by The Andrew W. Mellon Foundation, Ithaka, 
the Library of Congress (LC), and JSTOR, Portico pro- 
vides limited access for audit purposes and institutionwide 
access in the event content is no longer available from a 
participating publisher. Portico intends to provide a reliable 
methodology for ongoing access to an institutions schol- 
arly collections. A growing number of large and important 
publishers are partnering with Portico, including Elsevier, 
Oxford University Press, the University of Chicago Press, 
John Wiley and Sons, the UK Serials Group, the American 
Anthropological Association, and the Berkeley Electronic 
Press, with more than five thousand journals already slated 
to be archived. A list of nearly two hundred libraries of vary- 
ing sizes and missions have either become partners with 
Portico or are seriously considering becoming involved. 

The developing work of LC and its National Digital 
Information Infrastructure and Preservation Program 
(NDIPP) also is significant. 8 NDIPP's mission is to work 
closely with a number of federal agencies and private 
partners to provide a national focus on important policies, 
standards, and technical components required to preserve 
digital content. Various options and technical solutions are 
being explored and tested. With an appropriation of nearly 
$100 million, LC s role will be an important part in helping 
address these issues. A recently announced Web site related 
to this work is the WEB Capture site. 9 Since 2000, LC has 
been selectively capturing and preserving sites in such areas 
as Hurricane Katrina, recent Supreme Court nominations, 
and the transition subsequent to the death of Pope John 
Paul II. Current projects include the crisis in Darfur, Sudan, 



the Iraq War, and the 2006 elections. Also, related to the 
earlier description of electronic journal archiving initiatives, 
LC and the British Library, under NDIPP's auspices, have 
agreed to support a common archiving standard for elec- 
tronic content migration. 10 Finally, the MetaArchive Project, 
a collaborative venture of eight institutions, including LC, is 
a three-year project to develop an infrastructure to capture 
at-risk digital content related to the culture and history of 
the American South. 11 

Also on the federal level, the National Archives and 
Records Administration and the San Diego Supercomputer 
Center have recently agreed to work together to preserve 
valuable digital collections. 12 This unprecedented part- 
nership between the National Archives and an academic 
institution marks an opportunity for securing critical data 
created for, or by, agencies of the United States federal gov- 
ernment's executive branch. 

Brewster Kahle's Internet archive, Wayback Machine, 
is another effort to archive digital information, specifically 
Web pages. 13 Begun a decade ago, it provides the ability 
to browse, not search, more than 55 billion Web pages; by 
design, it is not comprehensive and efficient in mining its 
content. However, the Internet Archive has collaborated 
with the Smithsonian and LC, and has developed a number 
of important collections, including the United Kingdom 
Central Government Web Archive, a collection devoted to 
sites instrumental in the early development of the Internet, 
and a number of election sites. 14 It now offers a Web Archive 
on Demand Service, which is a subscription-based archiving 
service targeted to a range of institutions at costs lower than 
some other archiving platforms. Called Archive It, subscrib- 
ers can capture, organize, and theoretically preserve material 
from the Internet as well as their own institutions and collec- 
tions. Users can then search these collections fairly easily. 15 



Campus-wide Asset Management: 
The Duke/Dartmouth Project 

Work is nearing completion as part of a shared planning grant 
undertaken by Duke University and Dartmouth College and 
sponsored by The Andrew W. Mellon Foundation. Many 
projects are underway at a variety of research institutions, 
but they have not necessarily been undertaken in the con- 
text of a broad, campuswide asset management plan. Rather 
than looking for technological solutions, the focus of this 
grant has been on designing institutional strategies and poli- 
cies for managing scholarly and administrative assets in digi- 
tal form. Taking an institutional approach potentially brings 
advantages: a common understanding of the value of asset 
management, a shared commitment to building a campus- 
wide repository, economies of scale, and the possibility of a 
consistent set of policies that will apply to a wide range of 



30 Horrell 



LRTS 52(1) 



materials. The decentralized nature of most institutions and 
the resulting silos of content and policies and proceedings 
associated with them is a considerable barrier to a common 
view of the overall challenge. There are cultural obstacles 
that come into play. Faculty members are not accustomed or 
always comfortable in placing their work in an institutional 
repository and have traditionally managed it themselves. 
Funding also is a challenge. Without a comprehensive plan 
to support the development of the systems, it is difficult to 
contemplate a successful outcome. 

An institutional program, as identified in the collabora- 
tive planning project at Duke and Dartmouth, has several 
elements: 16 

• Structure. An overall program to manage a university's 
digital assets requires a formal organizational structure 
that is part of the institution's overall administration. 
There should be a steering committee with broad 
oversight. The committee's charge should include 
responsibility to research, develop, and implement an 
enterprisewide program for the institution. Policies 
will need to be established in addition to hiring 
staff, charging subgroups, developing implementation 
teams, and assigning accountabilities to individuals 
and groups across the institution. Three areas are key: 
priorities, policies, and implementation. 

• Priorities. Setting priorities for attention and resourc- 
es is essential. A census of asset areas should be 
developed to serve as the basis for priority setting. 
Information considered vital to the operation of the 
institution, or of irreplaceable value, will be of highest 
priority. There may be stopgap measures necessary 
to prevent loss. Timing may well be critical in these 
instances, and the committee will need to be flexible 
and act quickly. 

• Policies. A set of principles and policies for digital 
asset management for the university will provide 
continuity and clarity for its users and stakeholders: 

• Think globally. Because of the scope and com- 
plexity of the effort, it is important initially to 
think globally and act locally. Decisions made in 
designing and implementing specific solutions 
should take into account issues of scalability, insti- 
tutional capability, future data migration, available 
support, and overall institution efficiency. 

• Incentives for participation. While the digi- 
tal records of university administration are 
clearly owned by the university (and staff can 
be required to deposit them in an asset man- 
agement system), the issues are not so clear for 
faculty, as intellectual property ownership and 
workflow processes are much less straightfor- 
ward. Therefore, planners should carefully think 



through how to make it clear that participation 
in a university digital asset management program 
is in the best interest of faculty, not just of the 
university. It should be self-evidently useful to 
all whose participation is required for its success, 
and should meet their needs and save them time. 

• Confidentiality and openness. Planning for a uni- 
versity digital asset management system should 
include development of policies that differentiate 
among a variety of use cases and provide for dif- 
ferent levels of access and security, depending on 
the submitter's and university's needs for open- 
ness or confidentiality in those cases. 

• Terms of use, stewardship, and governance. The 
policies that govern the rights and responsibilities 
of the various program stakeholders must find a 
way to balance potentially conflicting require- 
ments between depositors and the institution, and 
account for changing needs as digital assets mature 
through different phases of their life cycle. 

• Implementation. While each college or university 
may have different dynamics, there are some points 
that will improve the chances of success in most aca- 
demic environments: 

• Sponsorship and the Digital Asset Management 
Steering Committee. The first step in establish- 
ing an enduring institutional culture of informa- 
tion stewardship is to secure explicit, high-level 
endorsement and support. A sponsor or spon- 
sors at the level of the provost or executive vice 
president can facilitate a good start and ensure 
ongoing support. The second step is to create a 
broadly based steering committee to manage the 
program's establishment and foster collaboration 
across the university. 

• Steering committee activities, the steering com- 
mittee will need to assign tasks to short-term 
teams. While the committee should retain the 
priority-setting and oversight activities, the 
development of day-to-day practices and support 
should be tasked to functional groups, such as 
administrative departments, the library, and infor- 
mation technology organizations. 

• Establishing a permanent organizational struc- 
ture. Informed by the work of the program 
coordinator and start-up teams, the steering com- 
mittee needs to identify a permanent organiza- 
tional home for the program within the university. 

• Commonalities and differences among differ- 
ent parts of the organization. As the system is 
developed, it must be responsive to the diverse 
requirements of distinctive units of the institu- 
tion. The differences in managing material for 



52(1) LRTS 



Converting and Preserving the Scholarly Record 31 



teaching, research, and administration can be 
dramatic. Intellectual property and access issues 
arise frequently on the academic side, while 
administrative data may need to be summarized 
and analyzed on an ad hoc basis by many organi- 
zations on campus. 
• System attributes. Attributes of an underlying 
technology system can influence the program's 
success or failure. The steering committee should 
identify the technology characteristics needed for 
success, including: 

• The technology that supports the digital asset 
management effort will evolve over time. 

• The institution must retain the ability to 
export and migrate materials to new technol- 
ogy as systems evolve. 

• It must be possible to remove materials from 
the system. 

• The time it takes to access materials must be 
perceived as reasonable. 

• There must be appropriate tools to access, 
analyze, or transform the materials stored. 

• There must be selective degrees of access 
provided to materials. 

• Policies must provide guidance for adminis- 
trators, faculty, and other users of the system 
on what can be added and what cannot be 
added, as well as how long they should be 
retained. 

• Measuring success. The steering commit- 
tee should require an evaluation/assessment 
model to be developed to measure the suc- 
cess of the endeavor. 

• A final key role of the steering committee 
and the sponsors is to assign responsibilities 
and identify institutional custodian(s) of the 
materials. 



Concluding Thoughts 

Ultimately, we must provide a secure infrastructure to 
ensure the enduring viability of digital content for the busi- 
ness aspects of our institutions and for the intellectual assets 
produced and acquired by and for the scholarly communi- 
ty. 17 Future generations of students, faculty, administrators, 
and scholars are depending on us. The Duke/Dartmouth 
report, submitted to the senior leadership at Dartmouth, is 
a call to action. Wess Jolley, Dartmouth's records manager, 
describes it as follows: 

It is a call for leadership within our institution 
in response to a critical need. Inaction based on 



concern about the potential difficulties is not an 
option. Indeed, the accelerating transition from 
paper to digital records has already caused the irre- 
versible loss of vital historical information. Unless 
we act decisively and immediately, the first years 
of the twenty-first century will be forever known 
as the era of lost history. Besides the intellectual 
impact, from a legal standpoint the transition to 
digital record keeping is a ticking bomb for our 
institutions. As digital systems replace paper, our 
carefully formulated records retention programs 
are becoming null and void. Without a digital 
equivalent to lifecycle controls traditionally estab- 
lished for paper records, each new digital record is 
a potential legal liability, and our ability to conduct 
the business of our institution becomes increas- 
ingly difficult. Simply put, we are losing control of 
our records with every passing day. 18 

A centralized approach to digital asset preservation can 
reduce legal liability and begin to ensure digital records are 
captured, maintained, disposed, and preserved over time. 
We should not underestimate the scale, scope, and ongoing 
nature of this task, and we cannot disregard or fail to meet 
this challenge. Too much is at risk and at stake. Now is the 
time for leadership and action. 

References 

1. Kevin Guthrie, "Archiving in the Digital Age," EduCause 
Review (Nov./Dee. 2001): 57-65, www.edueause.edu/ir/ 
library/pdf/erm0164.pdf (accessed Jan. 16, 2007). 

2. Dartmouth College Library, "Dartmouth College Library 
Mission and Goals Fiscal Year 2005-2006," www.dartmouth 
.edu/~library/col/DCL_Mission_Goals.pdf (accessed Jan. 16, 
2007) 

3. Donald J. Waters, "Managing Digital Assets in Higher 
Education: An Overview of Strategic Issues," Association 
of Research Libraries Bimonthly Report no. 244 (Feb. 

2006) : 1-10, www.arl.org/newsltr/244/assets (accessed Jan. 16, 

2007) . 

4. Jeffrey Horrell and Martin Wybourne, "Archiving the Digital 
Age: How Do We Preserve Our Present for the Future?" Vox 
of Dartmouth, July 25, 2005. 

5. LOCKSS, www.lockss.org/lockss/Home (accessed Jan. 16, 
2007). 

6. CLOCKSS, www.lockss.org/clockss/Home (accessed Jan. 16, 
2007). 

7. PORTICO, www.portico.org/ (accessed Jan. 16, 2007). 

8. Library of Congress, National Digital Information 
Infrastructure and Preservation Program, www.digitalpreser 
vation.gov/index.html. (accessed Jan. 16, 2007). 

9. Library of Congress, "WEB Capture," www.loc.gov/webcap 
ture/index.html (accessed January 16, 2007). 

10. Sustainability of Digital Formats Planning for Library of 
Congress Collections, "NCBIArchJ, NCBI/NLM Journal 



32 Horrell 



LRTS 52(1) 



Archiviag and Interchange DTD, version 1" (Apr. 19, 2006), 
www.digitalpreservation.gov/formats/fdd/fdd000174.shtml 
(accessed Oct. 27, 2007). 

11. "MetaArchive," www.metaarchive.org (accessed Jan. 16, 
2007). 

12. San Diego Supercomputer Center, News Center, "The 
National Archives and the San Diego Supercomputer Center 
Sign Landmark Agreement to Preserve Critical Data" (June 
27, 2006), www.sdsc.edu/News%20Items/PR062706.html 
(accessed Jan. 16, 2007) 

13. Internet Archive, Wayback Machine, www.archive.org/web/ 
web.php (accessed March 31, 2007). 

14. UK Government Web Archive, www.nationalarchives.gov.uk/ 
preservation/webarchive (accessed Jan. 16, 2007). 



15. Internet Archive, "Archive-It: Archiving the Internet for 
Future Generations," www.archive-it.org (accessed Jan. 16, 
2007). 

16. Digital Asset Management: Elements of an Institutional 
Program — Final Report on the Duke/Dartmouth Project 
(Draft of 30 November 2006), www.dartmouth.edu/~library/ 
col/docs/0607/DukeDartmouth.pdf (accessed Dec. 9, 2007). 

17. Horrell and Wybourne, "Archiving the Digital Age," 7. 

18. Wess Jolley, College Records Manager, Dartmouth College, 
personal conversation with the author, March 17, 2006. 



Archival Products 

p/u LRTS51n4 p. 280 



52(1) LRTS 



33 



Su Chen (suchen@umn.edu) is Head 
of the East Asian Library, University of 
Minnesota Libraries, Minneapolis; 
Chengzhi Wang (cw2165@columbia 
.edu) is Chinese Studies Librarian, C. 
V. Starr East Asian Library, Columbia 
University, New York. 

The authors wish to thank Marcia 
Pankake for her inspiration to write the 
article and her invaluable advice and 
comments during the writing. Her guid- 
ance and help are greatly appreciated. 
The authors are indebted to Bob Nardini 
and Carolyn Morris, Bibliographers at 
YBP for helping with data collection 
and proofreading drafts. The authors 
are grateful to Peggy Johnson and her 
anonymous reviewers for their helpful 
advice and comments for revision. The 
authors thank Barbara Davis for her edit- 
ing of the final draft. 

Submitted January 26, 2007; tentatively 
accepted pending revision March 15, 
2007; revised dnd resubmitted May 2, 
2007, and accepted for publication. 



Who Has Published 
What in East Asian 
Studies? 

An Analysis of Publishers 
and Publishing Trends 

By Su Chen and Chengzhi Wang 

This study examines Western-language, particularly English-language, mono- 
graphs on East Asian studies published in the United States, Canada, England, 
Australia, and other countries from 2000 through 2005. The study provides a 
landscape view of the scope and trends of publications for both scholars and 
librarians in East Asian studies. The data for this study were collected from the 
YBP's GOBI (Global Online Bibliographic Information) database, covering pub- 
lications profiled by YBP from January 1, 2000, through December 31, 2005. The 
results of data analysis shed light on scholarly currents and publishing trends in 
East Asian studies over that six-year period. 

Scholars and librarians in East Asian studies often wonder how research pro- 
ductivity and publishing trends evolve in the field. Which publishers are active 
in this field? What subject areas have been covered prolifically or meagerly, and 
what does the publishing landscape look like? Traditionally the areas of East 
Asian literature, history, and philosophy have been strongly represented. Is this 
still so? Have traditional trends experienced any shifts? Which publishers are the 
major players in the field? Do university presses publish in different areas from 
commercial publishers? 

Some fifty years ago, Frederick Mote (1922-2005), a leading professor of 
Chinese history and culture at Princeton University, raised similar questions. He 
surveyed important academic publishers and their major publications, introduc- 
ing new publishing developments in Chinese studies in the Republic of China 
on Taiwan to the Journal of Asian Studies audience. 1 He wrote, "Although the 
Journal has on several occasions during the last five or six years reported briefly 
on publication there, now there is perhaps some value in reporting more com- 
prehensively on recent developments, both because the phenomenon itself is of 
interest, and because many recently published items will be desired by scholars 
and by research libraries." 2 The authors of this article share his rationale in the 
examination of recent scholarly currents and publishing trends. 

The purpose of Mote's survey was "not to list all of the worthwhile books 
recently published, for that would be an obvious impossibility, but to make 
the general outlines and character of recent publication activities known, and 
to inform the reader of names and addresses of publishers from whom more 
detailed information can be obtained." 3 Today, however, improved technology 
can be utilized to achieve the goal of a fairly complete survey. Technologically, all 



34 Chen and Wang 



LRTS 52(1) 



worthwhile books recently published can be listed and ana- 
lyzed. The authors used YBP's Global Online Bibliographical 
Information (GOBI) database in hopes of providing a com- 
prehensive analysis of nearly all publications profiled. The 
analysis helped reveal characteristics of these publications 
and East Asian studies publishing trends. 

This study's purpose was to evaluate the scope of, and 
trends in, East Asian research and publications. In the 
study, the authors use the term "East Asian studies" to refer 
to studies on China, Japan, and Korea; "Chinese studies" 
includes People's Bepublic of China, Taiwan, Hong Kong, 
Macao, and Tibet. Within "Korean studies," both North 
Korea and South Korea are covered. In this study, the focus 
is on print books. The scope includes English-language 
monographs on East Asian studies published in the United 
States, Canada, England, Australia, and other countries 
between 2000 and 2005. The data were collected from 
GOBI, covering publications profiled by YBP from January 
1, 2000, through December 31, 2005. 

Besearch and publishing trends are of interest to 
publishers, though largely from the perspective of sales. 
The Association of American Publishers Industry Statistics 
Annual Beport registers data based on publishers' responses 
to questionnaires, collecting data on the sale of books in 
the category of "Professional and Scholarly Publishing," 
which covers categories of technical, scientific, law, busi- 
ness, humanities, and medical materials. 4 However, data and 
information on publications pertaining to Asian studies or 
East Asian studies are not readily available. 

Besearch and publishing trends also are of interest 
to governmental and nongovernmental organizations. For 
instance, the Tokyo-based Centre for East Asian Cultural 
Studies for UNESCO (1961-2003) published the annual 
Asian Research Trends: A Humanities and Social Science 
Review from 1991 through 2003.° After a hiatus of a few 
years, the annual was continued by Toyo Bunko in 2006. 
This annual publication provided useful information on 
the history and trends in research topics on both macro 
and micro levels in Asian studies, particularly in East 
Asian studies, and in various countries and regions. The 
Ford Foundation sponsored the editing and publishing of 
such bibliographies on Asian studies as India and America: 
American Publishing on India, 1930-1985, which traced 
the historical development of American publishing on 
Indian studies. 6 The Japan Foundation has had multiple 
initiatives to introduce new publications and help library 
collection development. Its Japanese Book News often 
has included articles on publishing trends on Japan since 
1993.' With sponsorship from the Japan Foundation and 
others, the North American Coordinating Council on 
Japanese Library Besources (NCC) is known for facili- 
tating collection development and resource sharing on 
Japanese studies. 



Scholarly and publishing trends always are of interest 
to scholars and librarians. Numerous monographs, journals, 
journal articles, special issues of journals, and conferences 
proceedings have been devoted to the study of East Asian 
studies, particularly country-specific studies. For example, 
Hardacre analyzed the postwar development of Japanese 
studies in the United States in various subject areas. 8 The 
Taiwan-based journal Issues if Studies: An International 
Quarterly on China, Taiwan and East Asian Affairs pub- 
lished a special issue, "The State of the China Studies 
Field," edited by Marble. 9 In addition to articles on vari- 
ous subject areas of Chinese studies, the special issue also 
included commentaries from editors of five leading journals 
that represented the state of scholarly journals in this field. 
Furthermore, it contained a useful bibliographical appen- 
dix of articles on the state of the China studies field. For 
Korean studies, the conference on the "Future of Korean 
Studies in the United States" was held at the University of 
California, Berkeley, in 2001 with the support of the Korean 
Foundation. 10 



Literature Review 

The authors searched general and subject databases for 
literature related to trends in scholarly publications on 
East Asian studies; however, the data were unexpectedly 
scarce. Publications with comprehensive data and analyses 
were especially limited. Articles by both scholars and librar- 
ians who sought to survey, review, and analyze the state 
of the scholarship development, did, however, examine 
the trends of publishing on East Asian studies as part of 
their research. 

Mote's 1958 survey of leading publishers and their 
major publications in Taiwan in 1954-1955 was of great ben- 
efit to scholars and librarians because it outlined and charac- 
terized recent relevant publications of major publishers and 
listed contact names and addresses for these publishers. 11 
Soong discussed the most recent developments in Chinese 
publishing by examining the five issues of the Ch'uan-kuo 
hsin shu-mu (The National Bibliography) published in 1973, 
identifying and introducing a number of new important 
works. 12 It was informative and insightful, especially during 
the Culture Bevolution (1966-1976), when information on 
Chinese publications was scare, but the article was based 
on data from only five issues of the monthly, and only a 
small number of books published were of long-term aca- 
demic value. 

The Committee on East Asian Libraries Executive 
Group of the Association of Asian Studies issued a report on 
current trends as part of an Association of Besearch Libraries 
project titled "Scholarship, Besearch Libraries and Foreign 
Publishing." 13 This general report estimated current trends 



52(1) LRTS 



Who Has Published What on East Asian Studies? 35 



of East Asian publishing along with extensive availability of 
electronic resources, described historical and current collect- 
ing patterns, analyzed library response to price trends, and 
gauged trends of scholarly research. However, "the report 
does not attempt in-depth analysis and at best provides only a 
sketchy picture of the complex array of problems facing East 
Asian studies librarians in these rapidly changing times." 14 
Though the information in the article on East Asian library 
collections was presented with adequate statistical data from 
individual East Asian libraries, the publishing trends in East 
Asia were not supported with detailed data and information, 
and the trends of the English-language publishing in the 
world, particularly North America, were not mentioned. 

Blum's 2002 book review covered seven books on 
Chinese ethnic minorities and attempted to trace a decade 
of publishing about China's ethnic minorities. 15 It analyzed 
scholarly trends, guiding topics, and theoretical debates in 
the field of Chinese ethnic studies from the 1990s. Blum 
examined and analyzed a large number of works on ethnic 
minorities in order to present major research themes and 
publication trends. Despite the lack of detailed statistical 
data, Blum, as with Mote, offered one of the few publica- 
tions of in-depth analysis of research trends. 

Shulman provided an interesting and useful annotated 
bibliography of books and doctoral dissertations on library 
and information science related to East Asia completed or 
published between 1999 and 2004. 16 He did not indicate, 
however, how he collected the data, simply reporting that 
he called for authors and librarians to submit information. 
Without more systematic data collection, he may have left 
out some relevant titles. 

Leung, Chan, and Song examined the publishing trends 
in Chinese medicine and related subjects via search and 
analysis of records documented in OCLC's WorldCat. Their 
study aimed to give an overview of how Chinese medicine 
had been interpreted and presented to the non-Chinese 
world, and to identify emerging trends. They analyzed the 
publishing trends in Chinese medicine and related subjects 
in all languages except Chinese, ranging from books and 
serials to audio-visual and electronic resources from the 
past thirty years. Their findings showed publications in 
Chinese medicine and related subjects flourished from the 
1970s, and materials in English constituted the major por- 
tion of total output. This study is notable, as it is one of the 
few to comprehensively examine publishing trends in one 
subject area by utilizing the online search technologies of 
a very large bibliographic database. However, as with the 
Committee on East Asian Libraries Executive Group of the 
Association of Asian Studies, the data collected were library 
records that focused on collection development rather than 
publishing trends. 1 

A considerable amount of literature has sought to 
examine the state of scholarship, research currents, and 



publishing trends related to East Asian studies, particularly 
country-specific studies. The literature is conducive to a 
better understanding of the development of East Asian 
studies scholarship and librarianship. Yet a systematic, com- 
prehensive examination and analysis of publishing trends in 
East Asian studies and the specific countries and regions has 
been lacking. 

Of the literature examined, only Leung, Chan, and 
Song collected comprehensive, quantitative data. Most of 
the literature offered no more than general observation or 
sketchy impressions of publishing trends in East Asian stud- 
ies. With the help of distributed information technology, the 
authors aimed to gather more comprehensive quantitative 
data and present their analysis of publishing trends of East 
Asian studies. 

Research Method: A Different Approach 
for Gauging Publishing Trends in East 
Asian Studies 

YBP is one of the largest academic book distributors in North 
America and served as the source of data. The data used in 
this study were collected on June 14, 2006, from GOBI, YBP's 
proprietary database. At that time, the database contained 
approximately 3 million English-language titles published 
worldwide, including publications from more than 40,000 
publishers outside Europe; from 6,300 European publishers 
listed by Lindsay & Croft, a United Kingdom-based YBP 
subsidiary focusing on the United Kingdom; and from other 
European academic book supply and library services. 

Every year, YBP profiles approximately 55,000 new 
titles published by 1,800 publishers (about 1,100 titles every 
week), adding the resulting bibliographic detail to GOBI. 18 
Profiling refers to selecting and describing books that match 
academic libraries' profiles, or collecting interests, as out- 
lined by collection development librarians. The profiled 
books contain detailed bibliographic and imprint informa- 
tion as well as YBP subjects and geographical descriptors 
added book-in-hand by YBP bibliographers. YBP practice 
also is guided by cataloging rales and such tools as AACR2 
(Anglo-American Cataloguing Rules, 2nd ed.); Library of 
Congress classification schedule; and Library of Congress 
subject headings, including geographical headings applied 
to profiled books. These features permit online search func- 
tions similar to those of most online public access catalogs 
(OPACs). In addition, some unique features, such as profil- 
ing dates, are available. 

Most of the English-language monographs profiled 
by YBP are searchable online in GOBI. They have been 
published in the United States, Canada, England, Hong 
Kong, or Australia. Other countries and regions are less 
represented. Publications by associations and societies, such 



36 Chen and Wang 



IMS 52(1) 



as the Association for Asian Studies, are not necessarily 
profiled by YBP or included in GOBI. Overall, the publish- 
ers excluded from the database constitute a relatively small 
portion of the total. The database covers approximately 90 
percent of the publishers of English-language materials on 
East Asian studies. YBP updates its publisher coverage on a 
regular basis. 

In order to retrieve the publications on East Asian stud- 
ies, the following search criteria were used: 

• Publishing dates: 2000 through 2005. 

• Profiling dates: January 1, 2000 through December 
31, 2005. 

• Scope: Publishers covered by YBP and Lindsay & 
Croft. 

• Content level (these categories are assigned by YBP): 
general academic, advanced academic, popular, and 
professional; the juvenile category was excluded. 

• Geographical descriptors: China, Japan, Korea, Hong 
Kong, Taiwan, Macao, and Tibet. Because Tibet is 
treated as an individual entity in GOBI, apart from 
China, Tibet was used as the geographical descriptor 
to capture the relevant data on Tibet. 

• Title count: The same title by the same author pub- 
lished simultaneously in the United States and the 
United Kingdom was counted as one title. Hardcover 
and paperback editions for the same title by the same 
author also were counted as one title. 

Additionally, search result quality control was main- 
tained by repeating the search at different times and 
comparing results. The same search strategy with identical 
criteria was repeated in May and June 2006 to see if the 
retrieved data differed. The numbers of hits were found 
to be consistent on five separate occasions. The repetition 
of searches and the comparison of results were conducted 
to ensure the reliability of both the criteria defined and the 
method applied. Such criteria and method may be applied 
for examining other subject areas. 

Findings: Who Has Published What? 

This study resulted in interesting findings on publishing 
trends, particularly output of publications, distribution of 
output with publishers, and representation of subject areas 
over the years. The total number of published monographs 
listed in GOBI on East Asian studies from 2000 through 2005 
was 4,924, of which 2,710 monographs were on China, 1,854 
on Japan, and 360 on Korea (see table 1). China remained 
the major focus of scholarly interest in the period of study, 
accounting for more than half of the published monographs 
each year. During these six years, the aggregated number of 



monographs on China was about 1.46 times as many as those 
on Japan, and 7.52 times as many as those on Korea. 

From 2000 through 2005, publishing on East Asian 
studies as a whole experienced significant growth — the total 
number of published monographs increased from 633 in 
2000, to 928 in 2005. However, publishing in each of the 
three areas — China, Japan, and Korea — experienced dips 
in different years during the period. As shown in figure 1, 
Japanese studies experienced nearly the same decline in 
2003 and 2004, but recovered in 2005, surpassing its 2002 
total. For Chinese studies, the decline existed in 2005, with 
488 monographs compared to 522 in 2004. For Korean 
studies, the dip occurred in 2004, with 47 monographs pub- 
lished, compared to 66 produced one year earlier. The year 
2005 was a very productive one for Korean studies, with 84 
monographs produced, almost double the output of 2000. 

Publishers with an output of 10 or more monographs 
accounted for nearly half of the total output of monographs 
on East Asian studies. In 2005, 22 (of 325) publishers pub- 
lished 10 or more monographs (see table 2). Among the 22 
publishers, 10 are university presses; their total output was 



Table 1. Number of monographs published on China, Japan, 
and Korea, 2000-2005 



Years 


Total 


China 


Japan 


Korean 


2005 


928 


488 


356 


84 


2004 


876 


522 


307 


47 


2003 


843 


474 


303 


66 


2002 


873 


475 


335 


63 


2001 


771 


396 


320 


55 


2000 


633 


355 


233 


45 


Total 


4,924 


2,710 


1,854 


360 



600 




1999 2000 2001 2002 2003 2004 2005 2006 



Figure 1. Number of monographs published on China, Japan, 
and Korea, 2000-2005 



52(1) LRTS 



Who Has Published What on East Asian Studies? 37 



174. Of the rest, 12 commercial publishers produced 229 
monographs. In terms of productivity, commercial presses 
were more productive and active than their counterparts in 
the academic world. On an average, commercial publishers 
produced 19 titles, while university presses only produced 
8 books. 

During the six years, an increasing number of publish- 
ers began publishing monographs on East Asian studies. In 
2000, 217 publishers were printing monographs on East 
Asian studies, only 14 of whom produced 10 or more. In 
2005, the number of publishers increased to 325 (108 more 
than in 2000), and the number of publishers producing 10 
or more monographs jumped to 22. 

Among university presses, the University of Hawaii 
Press was a leading publisher during the six years, produc- 
ing 183 books total and about 30 books per year. Oxford 



Table 2. Publishers producing ten or more titles in 2005 



Publishers 

W UUIIOI Id o 


China 


Janan 


Korea 


Tota 


RnntlpHpp 


25 


12 


2 


39 


T Tnivprcih; T-Tmx^cii 'i Prp«c 

umvciMiy ui ndWdi i ricas 


14 


16 


7 


37 


RoutledgeCurzon 


15 


15 





30 


Hong Kong University Press 


24 


2 





26 


Palgrave Macmillan 


13 


13 





26 


Turtle Publishing 


1 


22 


1 


24 


Stanford University Press 


19 


2 


1 


22 


Brill 


17 


1 





18 


Columbia University Press 


12 


5 


1 


18 


Global Oriental 





11 


7 


18 


Kodansha 





16 





16 


Chinese University Press 


14 








14 


Harvard University 
Asia Center 


7 


7 





14 


Marshall Cavendish Academic 


12 


1 





13 


Edwin Mellen 


11 


1 





12 


Rowman & Littlefield 


10 


2 





12 


University Press of America 


4 


2 


6 


12 


Oxford University Press 


6 


3 


1 


11 


University of Washington 
Press 


7 


4 





11 


Cambridge University Press 


4 


6 





10 


Hotei Publishing 





10 





10 


Kegan Paul International 


1 


8 


1 


10 


Total 


216 


159 


27 


403 



University Press gradually reduced its output over the 
years to only 11 titles in 2005, compared to 31 in 2000. 
RoutledgeCurzon and Routledge were leaders among com- 
mercial publishers since 2000; Routledge produced a total 
of 154 books over the past six years. After Routledge and 
Curzon joined forces in 2003, RouteldgeCurzon had a total 
output of 164 monographs — about 55 books per year for 
2003 through 2006. Some publishers, such as M. E. Sharp, 
published much less in 2003 and 2004, and printed only 6 
monographs in 2005. In contrast, Global Oriental, a newly 
emerging commercial publisher, occupied a spot among the 
top ten in the changed publishing landscape of 2005. 

In terms of country-specific analysis, publishers seem to 
have reshaped the boundaries of their coverage, especially 
in newly emerged fields within East Asian studies. Global 
Oriental focused exclusively on Japan and Korea (see table 
2), Kodansha focused on various subjects of Japan, and 
Hotei focused its attention solely on Japanese arts. Chinese 
University Press, conversely, focused solely on China in 
2005, unlike in previous years, when it printed a few titles 
on Japan. 

A variety of subject areas were represented, except for 
general works (that is, collections, series, collected works, 
encyclopedias, dictionaries and other general reference 
works, indexes, museums, newspapers, periodicals, acade- 
mies and learned societies, yearbooks, almanac, directories, 
histories of scholarship and learning, the humanities). Only 
1 title described as general works on China was published 
in 2004. 

The published monographs were unevenly distrib- 
uted across subject areas. Overall, history, language and 
literature, and fine arts continued to grow and dominate 
the publishing landscape (see appendix). The aggregated 
numbers of monographs published on these three subject 
areas increased from 157, 96, and 48 in 2000, to 191, 189, 
and 128 in 2005, respectively. The increased output on his- 
tory in 2005 was 34 titles more than in 2000. The increase 
in literature was more notable, with 189 in 2005, doubling 
the 2000 output of 96. The biggest increase was in fine arts; 
the total output of 128 in 2005 was almost three times the 
total titles (48) produced in 2000. Activities in social sciences 
increased, with the total number of published monographs 
increasing from 159 in 2000, to 177 in 2005. Titles in social 
sciences, containing disciplines of economics, commerce, 
finance, and sociology, had outnumbered the fine arts. 
Auxiliary sciences of history, ethnic, music, medicine, agri- 
culture, naval science, and bibliography were low in 2000 
and remained low in 2005. The number of monographs on 
geography, anthropology, philosophy and religion, technol- 
ogy, and military science was higher, while the total number 
of monographs on political science experienced a slight 
decline, from 34 in 2000, to 24 in 2005. Law and education 
experienced erratic changes over the years. 



38 Chen and Wang 



LRTS 52(1) 



The output of published monographs was not evenly 
distributed in terms of country coverage. In most years, the 
numbers of monographs on various subject areas pertaining 
to China are larger than those pertaining to Japan, except 
for in agriculture and technology. Over the six years, a total 
of 34 monographs on agriculture and 111 monographs on 
technology pertaining to Japan were published. 

Special mention should be made about the publishing 
trends represented by the monographs and their subject 
areas from the leading presses, such as the University 
of Hawaii Press and Routledge. Over the six years, the 
University of Hawaii Press published a total of 184 mono- 
graphs, of which 82 were on China, 81 on Japan, and 21 on 
Korea. Routledge produced a total of 148 titles, of which 73 
were on China, 62 on Japan, and 13 on Korea. Similar to the 
trend of the larger publishing world discussed above, both 



experienced increases in monographs over the years, yet 
the increase was far from linear, with up-and-down changes 
over the period. The concentrations of subject areas also 
were similar to that of the larger publishing world. Over 
the study's years, the University of Hawaii Press focused on 
language and literature, with 23 titles published on China, 
25 titles on Japan, and 7 titles on Korea. In religion and phi- 
losophy, the University of Hawaii Press published a total of 
21 titles on China, 13 titles on Japan, and 2 titles on Korea. 
East Asian history and fine arts was the third largest subject 
area concentration, with nearly identical numbers of mono- 
graphs on China and Japan. 

Sharing the general trends of the larger publishing 
world, Routledge, however, was much different from the 
University of Hawaii Press, particularly in terms of subject 
area concentrations. Over the years, Routledge's primary 



Table 3. Subjects areas that University of Hawaii Press published on China, Japan, and Korea 2000-2005 



China Japan Korea 

LCC Subjects 00 01 02 03 04 05 00 01 02 03 04 05 00 01 02 03 04 05 



G Geography, anthropology, 
recreation 



3 



A General works 

B Religion, psychology, 152445 223 

philosophy 

C Auxiliary sciences of history 1 
DS East Asian history 1 2 2 3 4 2 3 3 1 

E Ethnic 1 1 



1 12 1 

H Social sciences 3 3 3 1 2 2 4 2 1 



J Political science 1 111 

K Law 1 

L Education 

M Music and books on music 

N Fine arts 1 113 2 12 12 4 

P Language and literature 521053 624376 

Q Science 1 

R Medicine 

S Agriculture 1 

T Technology 

U Military science 

V Naval science 

Z Bibliography, library science, 
information resources 



Note: LCC=Library of Congress Classes; are displayed by the last digits;, e.g., 2000=00. 



52(1) LRTS 



Who Has Published What on East Asian Studies? 39 



emphasis was on the social sciences, with 32 titles published 
on China, 22 titles on Japan, and 7 titles on Korea. Next was 
East Asian history, with a total of 17 titles on China, 16 titles 
on Japan, and 2 titles on Korea. Political science followed as 
the third largest area of focus, with 9 monographs printed 
on China, 6 on Japan, and 2 on Korea. Unlike the University 
of Hawaii Press, Routledge had only 1 title on fine arts over 
the years. 

As noted previously, Tibet is included in Chinese stud- 
ies. The field of Tibetan studies has received increasing 
attention, thus publishing in this area warrants closer review. 
The roots of Tibetan scholarship in the United States can be 
traced back to the 1960s, when federal funding and count- 
less shipments of Tibetan texts from India fueled new pro- 
grams at such research universities as Columbia, Harvard, 
and Indiana — often in departments of religion or Sanskrit 
studies. In addition, the Tibetan Buddhist Learning Center 
based in Washington, New Jersey, produced several talented 
dharma students-cum-translators who subsequently entered 



mainstream academia and now hold chairs at these research 
institutions. Their research and instruction, coupled with 
greater access for fieldwork in China, has expanded the 
range of Tibetan research from Oriental studies and phi- 
lology to religious studies, history, anthropology, cultural 
studies, comparative literature, and the social sciences. 
Grassroots activism in the 1990s also may have presented 
Tibet as a possible field of study in the minds of young stu- 
dents. 19 The Dalai Lama as a charismatic religious leader of 
Tibet also is conducive to the development. 

The growth in Tibet-related publishing is evident. 
From 2000 through 2005, 98 monographs were published, 
accounting for 3.6 percent of the Chinese studies mono- 
graph output in the six years. No publications appeared in 
2000 and 2001. In 2002, 2 titles related to description of 
and travel to Tibet were published. However, 2003 saw an 
increase in monograph publishing, to an annual output of 29 
titles, more than 6 percent of the total monographic output 
in Chinese studies that year, second to the yearly output of 



Table 4. Subjects areas Routledge published on China, Japan, and Korea 2000-2005 



LCC Subjects 

A General works 

B Religion, psychology, philosophy 

C Auxiliary sciences of history 

DS East Asian history 

E Ethnic 

G Geography, anthropology, 
recreation 

H Social sciences 

J Political science 

K Law 

L Education 

M Music and books on music 

N Fine arts 

P Language and literature 

Q Science 

R Medicine 

S Agriculture 

T Technology 

U Military science 

V Naval science 

Z Bibliography, library science, 
information resources 



00 01 



China 

02 03 



04 



Japan 

05 00 01 02 03 04 05 

2 12 2 

5 2 2 7 1 2 2 
1 



1 



1 



1 



12 4 4 7 1 3 3 

3 12 1 2 
1 

1 1 1 



Korea 

00 01 02 03 04 05 



1 3 2 
2 



Note: LCC=Library of Congress Classes; years are displayed by the last digits;, e.g., 2000=00. 



40 Chen and Wang 



LRTS 52(1) 



39 titles produced in 2004. In 2005, 28 titles were published, 
accounting for nearly 6 percent of the output of monograph 
publishing on Chinese studies that year. According to YBP 
categories, 41 of the 98 works were research level books, 25 
titles were supplementary, and 31 titles were basic. Various 
publishers participated in the Tibetan publishing. Snow 
Lion published 17 titles, followed by Shambhala, which 
produced 13 titles. Brill of Netherlands produced 3, the 
University of California Press produced 2, and the Oxford 
University Press and the University of Washington Press 
each produced 1. The number of subject areas was not as 
diverse as that of the larger field of Chinese studies. Of the 
98 titles, 45 titles were on religion, 28 on history, 10 on fine 
arts, 5 on language and literature, 3 on social science, 2 each 
on education and geography, 1 each on science and technol- 
ogy, and 1 related to America. Of the 45 religious titles, 42 
titles were on Buddhism. In recent years, publishing in the 
field of Tibetan studies centered on the subject areas of reli- 
gion, history, and fine arts, particularly on religion. 

Conclusion 

The findings have provided information on the general out- 
put of English-language monographs of East Asian studies 
and on specific countries, on the productivity of university 
presses and commercial publishers, on the concentrations 
of subject areas, and on the changes over the six years 
examined. The study also looked at trends of research in 
East Asian studies. It not only sheds light on the publishing 
trends, but also helps in understanding the development of 
East Asian studies as a field. 

This study is not without limitations, however. GOBI 
was not exhaustive. The data selected from GOBI did not 
include every English-language monograph published in the 
field of East Asian studies. Moreover, in the sampled data, 
some publishers may have published the same monograph 
under a different name. These publications are duplicates, 
just with different titles. Thus, the authors inadvertently 
may have included some instances of duplicates. 

Nonetheless, these findings are significant for both 
scholars and librarians. The results summarized here pro- 
vide an analysis and overview of what has been published 
and how the field of East Asian studies has evolved in recent 
years. This can help those concerned with the field make 
informed decisions regarding scholarly research and collec- 
tion development. In particular, understanding publishing 
trends in relation to institutional and library priorities can 
help inform budget allocations for collections and approval- 
plan profile revision. For example, traditionally, East Asian 
studies collections in many library systems fall within the 
broader heading or organizational division of Arts and 
Humanities. These collections are more heavily weighted 



in the areas of arts and humanities, and budget allocations 
usually correlate to that division, particularly in small- to 
medium-sized libraries. Inviting re-evaluation of such divi- 
sion, the findings presented in the paper can help library 
decision-makers not only avoid the possibilities of some 
pertinent publications being missed, but also to rethink poli- 
cies and practices to better meet user needs associated with 
growing interests in other programs, such as social sciences. 
In addition, this study can serve as a model for research into 
publishing trends and patterns of other subject areas and 
academic disciplines. 

References and Notes 

1. Frederick W. Mote, "NOTES: Recent Publication in Taiwan," 
Journal of Asian Studies 17, no.4 (1958): 595-606. 

2. Ibid., 595. 

3. Ibid. 

4. Association of American Publishers, "Industry Statistics," www 
.publishers.org/industry/index.cfm (accessed June 24, 2006). 

5. Asian Research Trends: A Humanities and Social Science 
Review (Tokyo: Centre for East Asian Cultural Studies, 
1991-2003). 

6. N. Gerald Barrier, India and America: American Publishing 
on India, 1930-1985 (New Delhi: American Institute of 
Indian Studies, 1986). 

7. Japan Foundation, "Publications," www.jpf.go.jp/e/publish/ 
index.html (accessed June 24, 2006). 

8. Helen Hardacre, The Postwar Development of Japanese 
Studies in the United States (Boston: Brill, 1998). 

9. Andrew D. Marble, "Special Issue: The State of the China 
Studies Field," Issues & Studies 38/39, no. 4 (2002/2003): 
1-10. 

10. "The Future of Korean Studies in the United States," spon- 
sored by the Center for Korean Studies at University of 
California at Berkeley, May 7-8, 2001, http://ieas.berkeley 
.edu/events/2001.05.07-08.html (accessed June 24, 2006). 

11. Mote, "NOTES." 

12. James Chu-Yu Soong, "Chinese Publications in Early 1973," 
Journal of Asian Studies 33 no. 2 (1973): 289-93. 

13. Committee on East Asian Libraries Executive Group of the 
Association of Asian Studies, "'East Asian Collections: A 
Report on Current Trends Written as Part of the Association of 
Research Libraries' Project: Scholarship, Research Libraries, 
and Foreign Publishing in the 1990s," Committee on East 
Asian Libraries Rulletin 100 (1993): 88-109. 

14. Ibid, 89. 

15. Susan D. Blum, "Margins and Centers: A Decade of 
Publishing on Chinas Ethnic Minorities," Journal of Asian 
Studies 61 no. 4, no. 1 (2002): 1287-1310. 

16. Frank Joseph Shulman, "Doctoral Dissertations Concerned 
with Library and Information Science, Publishing, and Books: 
An Annotated Bibliography of Studies Relating to East Asia 
Completed between 1999 and 2004," Journal of East Asian 
Libraries 134 (2004): 1-28. 

17. Shirley Leung, Kylie Chan, and Lisa Song, "Publishing Trends 
in Chinese Medicine and Related Subjects Documented in 



52(1) LRTS 



Who Has Published What on East Asian Studies? 41 



WorldCat," Health Information and Libraries Journal 23, 
no.l (2006): 13-22. 

18. Information about YBP business practices and GOBI cover- 
age provided by Robert Nardini, then senior bibliographer at 
YBP, at the time data were collected. 

19. Lauren Hartley, Tibetan studies librarian, C. V. Starr East 
Asian Library, Columbia University, New York, conversation 



with Chengzhi Wang, Apr. 7, 2007; see also chapter six in 
Donald S. Lopez, Prisoners of Shangri-la (Chicago: Univ. of 
Chicago Pr, 1998) for further information on the develop- 
ment of Tibetan Studies in the United States. 



42 



LRTS 52(1) 



Mary F. Casserly (mcasserly@uamail. 
albany.edu) is Assistant Director for 
Collections and User Services, University 
at Albany-SUNY James E. Bird (jim.bird® 
umit.maine.edu) is Head, Science and 
Engineering Center, Raymond H. Fogler 
Library, University of Maine, Orono. 

Submitted March 20, 2007; tentatively 
accepted pending revision April 30, 
2007; revised and resubmitted June 30 , 
2007, and accepted for publication. 



Web Citation Availability 

A Follow-up Study 

By Mary F. Casserly and James E. Bird 



The researchers report on a study to examine the persistence of Web-based con- 
tent. In 2002, a sample of 500 citations to Internet resources from articles pub- 
lished in library and information science journals in 1999 and 2000 were analyzed 
by citation characteristics and searched to determine cited content persistence, 
availability on the Web, and availability in the Internet Archive. Statistical analy- 
ses were conducted to identify citation characteristics associated with availability. 
The sample URLs were searched again between August 2005 and June 2006 to 
determine persistence, availability on the Web, and in the Internet Archive. As in 
the original study, the researchers cross-tabulated the results with URL character- 
istics and reviewed and analyzed journal instructions to authors on citing content 
on the Web. Findings included a decrease of 17.4 percent in persistence, and 8.2 
percent in availability on the Web. When availability in the Internet Archives was 
factored in, the overall availability of Web content in the sample dropped from 
89.2 percent to 80.6 percent. The statistical analysis confirmed the association 
between the likelihood that cited content will be found by future researchers and 
citation characteristics of content, domain, page type, and directory depth. The 
researchers also found an increase in the number of journals that provide instruc- 
tion to authors on citing content on the Web. 

Students and researchers look to literature citations as links between what is 
new and what is already known. The value of accurate and valid citations can- 
not be overstated, as citations act as knowledge building blocks. Citations to Web 
resources and documents are increasingly found in scholarly articles, and over 
the past several years a significant body of literature on the stability of citations 
to content on the Web has developed. Many of these studies document decreas- 
ing availability of cited content over time; fewer also have attempted to identify 
factors that contribute to the stability of these citations. Recognizing the citation 
stability problem and knowing the factors that contribute to Web reference stabil- 
ity will help authors, editors, and publishers develop policies and conventions that 
will ensure long-term access to cited Web content. 

In 2002, the authors conducted a study of 500 citations containing URLs 
from articles published in library and information science journals in 1999 and 
2000. 1 In this earlier study, the authors described URLs that led to cited content 
as "permanent." In reporting the findings of the follow-up study, they use the 
term "persistent" in place of permanent. Persistent is now commonly used in the 
literature and better describes the quality being studied. The study addressed the 
following questions: 

• To what extent are authors currently referencing information and docu- 
ments "published" on the Web? 

• What percentage of cited electronic resources is available to be consulted 
by future scholars? How are they most often found? 



52(1) LRTS 



Web Citation Availability 43 



• Is it possible to identify characteristics of citations to 
Internet resources that will help predict the availabil- 
ity of the content to which they refer? 

• What type of guidance are authors receiving from 
editors and publishers? 

In the earlier study the authors found that the majority 
of the citations in the sample contained partial bibliographic 
information and no date viewed. Most URLs pointed to 
content pages with .edu or .org domains and did not include 
a tilde. More than half (56.4 percent) were persistent, and 
81.4 percent were available on the Web; searching the 
Internet Archive increased the availability rate to 89.2 per- 
cent. Content, domain, and directory depth were associated 
with availability. Few of the journals provided instruction 
on citing digital resources. The authors offered suggestions 
for updating scholarly communication citation conventions 
based on these findings. 

The purpose of this subsequent study is to deter- 
mine the changes in URL persistence included in the 
1999/2000 sample, the availability of cited content, and the 
instructions on citing Web content provided by journals to 
their authors. 



Literature Review 

The authors reviewed the literature on Web citation persis- 
tence through 2002 as part of the original study. 2 Since then, 
the published research has addressed Web page persistence 
rates, factors related to persistence, and, to a lesser extent, 
instructions to authors citing Web content. 

Persistence of Web Pages 

Sellitto provided an extensive review of the literature on 
Web site persistence of cited Web resources. 3 He selected 
123 papers from the 1995 to 2003 AusWeb conference 
series, examined the 2,168 references cited, and found 
Web resources in almost 50 percent of them. Of the Web 
resources cited, 45.8 percent were not found at the URL 
cited. Sellitto determined the average half-life for these 
Web citations to be 4.8 years. In a study of papers in derma- 
tological journals published between 1999 and 2004, Wren 
and colleagues found that 81.7 percent of URLs cited in 
papers published in 2004 were available. 4 The percentage 
decreased with time to an availability rate of 65.4 percent 
for URLs cited in papers published in 1999. 

Dellavalle and colleagues examined Internet references 
from the papers of three high-impact scientific journals 
(NEJM, JAMA, and Science) published over a six-week peri- 
od in three different years (2001-2003). 5 They found that 
the number of inactive Internet references increased over 



the twenty-seven-month period, reaching a high of 21 per- 
cent for JAMA, to a low of 11 percent for Science. Spinellis 
examined the Web citations in two computer science publi- 
cations from 1995 to 1999. 6 Almost 50 percent of the 4,224 
references he checked were not available after four years. 
Dimitrova and Bugeja examined Web references from a 
sample of papers published in communication journals 
between 2000 and 2003, and found 37 percent of the URLs 
no longer led to the content cited. In April 2004, Crichlow, 
Aguillo, and Prieto examined the Internet references in the 
articles of five major medical journals that were published 
in January 2004, and, by cutting and pasting the URLs into a 
Web browser, found that five were inaccurate (7.4 percent) 
and three had inaccessible sites (4.4 percent). 8 

Markwell and Brooks studied biochemistry and molecu- 
lar biology Web-based educational material and found that, 
in a twenty-four- month period starting in August 2000, more 
than 20 percent of the 515 sites examined had changed con- 
tent, moved with no automatic forwarding, or were broken 
links. 9 Ortega and colleagues compared Web content in 738 
Web sites in 1997 and again in 2004. As part of this study, 
they examined the persistence and stability of the links with- 
in these pages; they found that 74.28 percent of the 145,092 
links were "broken (linkrot) or not operative." 10 Bugeja and 
Dimitrova looked at Internet references cited in Association 
for Education in Journalism and Mass Communication 
online conference papers. 11 They found that after less than a 
year, 55 (51 percent) of the 108 citations led to the Web con- 
tent cited when they clicked on the link. However, when they 
cut and pasted the URLs into a Web browser, the number of 
found citations increased to 65 (60 percent). 

Bar-Ilan and Peritz conducted a broad-ranging study 
of Web pages concerning the subject "informetrics." 12 They 
looked at the number of Web sites retrieved on this subject 
in 1998, 1999, 2002, and 2003 and identified 866 URLs 
in 1998 that met their search criteria. Of those, 299 (34.5 
percent) were not available in 1999, with 496 (57.3 percent) 
and 552 (63.7 percent) not available in 2002 and 2003, 
respectively. Of the 1,297 URLs identified in 1999, 643 
(49.6 percent) were not available in 2002, and 769 (59.3 per- 
cent) not available in 2003. Of the 3,746 URLs identified in 
2002, 682 (18.2 percent) were not available in 2003. Koehler 
continued his study of Web page persistence begun with 361 
URLs identified in 1996 and found that by 2003, two-thirds 
of the original sample of URLs were gone. 13 

Kushkowski analyzed the citations in economic theses 
and dissertations in print and electronic format to see if dif- 
ferent patterns emerged in the citations to Web resources. 14 
He selected master's theses and doctoral dissertations from 
Virginia Tech, where electronic theses and dissertations 
were required, and Iowa State University, where electronic 
copies were not accepted, and found that a small percentage 
of citations in these works were to Web resources (Virginia 



44 Casserly and Bird 



LRTS 52(1) 



Tech had 5.4 percent, and Iowa State had 2.2 percent), 
with Web citations increasing between 1997 and 2002. 
Kushkowski also looked at the persistence of Web citations 
and found that approximately 55 percent of Web citations 
from both universities' theses and dissertations led directly 
to the documents cited. 

Wu briefly discussed the importance of persistence in 
Web documents in legal research, noting, particularly, the 
loss of interim documents, which are of great importance 
to both legal research and analysis. 15 In 1995, librarians at 
the National Library of Australia identified fifty publications 
on the Web that were considered sufficiently important for 
preservation. 16 They found that 22 percent of these publi- 
cations were still available in 2004 at their original URLs, 
and an additional 64 percent were still accessible on the 
Internet, although the content or style of some had changed. 
They predicted that in about five years, users will be able to 
find only 50 percent of the original publications. 

Factors Related to Persistence 

In 2004, Koehler found that three-quarters of the URLs that 
were still available from his 1996 sample were to navigation 
pages (that is, pages found at the server level and first level) 
as opposed to content pages (that is, pages found at the sec- 
ond level and below). 1 The depth of the path was linked to 
URL failure in Spinellis's 2003 study. 18 Wren and colleagues 
found that root directories were more likely to be available 
than those URLs with a directory depth of one. 19 They also 
found that the presence of a tilde or accession date did not 
affect URL availability. Dimotrova and Bugeja looked at 
four factors that could be predictors of URL stability and 
found more stability at the top-level sites. 20 They also found 
a positive relationship between year of publication and 
persistence, with the more recent the publication year, the 
higher number of accessible URLs. In addition, they discov- 
ered that the presence of a retrieval date with a Web citation 
was not related to URL persistence. 

Wren and colleagues found that domain was an indi- 
cator of availability, with .edu sites showing the greatest 
availability, followed in descending order by .org, .net, .com, 
and -gov. 21 Sellitto's study showed that .edu sites had the 
highest number of URLs classified as missing, followed by 
.com sites. 22 Over a twenty-four-month period, Markwell 
and Brooks found that .gov sites were the most stable, fol- 
lowed by .org and .edu, respectively, with .com sites being 
the least stable. 23 Dellavalle and colleagues looked at the 
persistence of links by domain and found that after twenty- 
seven months, references with .com domains had the most 
inactive links, followed by .edu, .gov, and .org. 24 In 2005, 
Bugeja and Dimitrova found that .edu sites were the most 
stable, followed by .com and .org sites. 25 In their 2006 study, 
they found that .gov and .org sites were the most stable (73.0 



percent and 71.0 percent active, respectively) followed by 
.com sites at 63.9 percent and .edu sites at 46.8 percent. 26 
Bar-Ilan and Peritz found that between 2002 and 2003, 
80.4 percent of .edu sites in their study were available, as 
opposed to 77.8 percent of .org sites and 55.5 percent of 
.com sites. 2 ' When they considered availability over a five- 
year period (1998 to 2003), the percentage of sites available 
by domain dropped to 21.6 percent, 21.9 percent, and 23.9 
percent, respectively. 

Instructions for Authors 

Schilling and colleagues surveyed the instructions for author 
pages and Web sites of the one hundred highest-impact 
journals in science and medicine as determined by ISFs 
Journal Citation Reports. 28 They found one journal that 
discussed maintaining access to Internet-cited information, 
eleven journals that provided authors with examples for 
citing Internet references using digital object identifiers 
(DOIs), and thirty-six journals that requested dates with 
Internet citations. None of the journals required that their 
authors provide DOIs. 

Methodology 

The researchers searched for the content cited by the 
sample of 500 citations from articles published in library and 
information science journals in 1999 and 2000 used for the 
original study. 29 They began the search by looking for the 
content sited at the URL included in the citation. If they did 
not find it, they searched for it elsewhere on the Web. All 
citations were then searched in the Internet Archive. The 
methodology used and described in depth in the original 
study was employed in this follow-up study with one excep- 
tion. In the original study, the researchers searched a sec- 
ond time for those URLs for which they initially received a 
URL unavailable or file not found message. These were not 
searched a second time in the follow-up study. The statistical 
program SPSS 14.0 for Windows was used to generate con- 
tingency tables and calculate the Pearson's Chi-Square val- 
ues. A p <0.05 level of significance was used for this study. 
The data for this follow-up study were collected between 
August 2005 and June 2006. The data for the original study 
were collected from January to July 2002. 

Findings 

Content Availability: URL Persistence 

URL persistence decreased by 17.4 percent during the peri- 
od between the original and follow-up studies. These data 



52(1) LRTS 



Web Citation Availability 45 



are presented in table 1. In the original study, 282, or 56.4 
percent, of the sample URLs, were persistent; that is, they 
pointed to the cited content or to Web pages that referred 
or redirected the researcher to the cited content. Of the 500 
citations studied, the content cited by 213, or 42.6 percent, 
could not be found at the URLs included in the citations 
and, therefore, were considered to be impermanent. In the 
follow-up study, 195, or 39 percent, of the sample URLs 
were found to be persistent, and 305, or 61 percent, were 
impermanent. 



Content Availability— Accessibility on the Web 

Cited content was considered to be accessible if, after failing 
to find it at the URL included in the citation or at a referred 
page, the researchers were able to locate it elsewhere on the 
Web. In the original study, the researchers had to search for 
the content of 213 impermanent URLs, while in the follow- 
up study they had to search for 300. The results of these 
searches are presented in table 2. 

The researchers found content cited in 3.8 percent of 
the original study impermanent URLs by truncating them, 
and in 4.2 percent by identifying errors that, when corrected, 
led to the cited information. In the follow-up study, the per- 
centage of content found by truncation dropped slightly to 



3.0 percent, with a 2.2 percent decrease in content found by 
correcting URL errors. In the follow-up study, the research- 
ers had relatively less success finding content at the URL 
cited by browsing and searching the Web site, and relatively 
more success using Google to locate content elsewhere on 
the Web. In the original study, they located content cited 
in 25.4 percent of the impermanent citations by browsing 
or searching the site to which the URL led them, while in 
the follow-up study this percentage was 21.3 percent. In 
the follow-up study, the researchers found 30.7 percent of 
the impermanent URLs by using Google, an increase of 5.3 
percent from the original study. 

The number of citations for which the cited content 
could not be found rose from 83, or 16.6 percent of the 500 
citations in the sample, in the original study, to 124, or 24.8 
percent of the sample, in the follow-up study. This repre- 
sents an 8.2 percent increase in unavailable content. 

Content Availability— Internet Archive 

The researchers searched the Internet Archive using the 
Wayback Machine (www.archive.org/index.php) to deter- 
mine if the URLs included in the sample citations had 
been archived. The results are presented in table 3. The 
percentages of all citations and of those that were consid- 
ered persistent that the researchers were able to find in the 



Table 1. Content availability: URL persistence in original and follow-up studies 



Content at cited URL or at referred page 


Original study URLs 


Follow-up study URLs 


Change 




No. 


% 


No. 


% 


% 


Found 


282 


56.4 


195 


39.0 


-17.4 


Not found 


213 


42.6 


305 


61.0 


+18.4 


Could not determine 


5 


1.0 








-1.0 


Total 


500 


100.0 


500 


100.0 





Table 2. Content availability: accessibility on the Web in the original and follow-up studies 



Content on Web 


Original study URLs 


Follow-up study URLs 


Chan 




No. 


% 


No. 


% 


% 


Found by truncating URL 


8 


3.8 


9 


3.0 


-.8 


Found by browsing and searching Web site 


54 


25.4 


64 


21.3 


-4.1 


Found by correcting error in URL 


9 


4.2 


6 


2.0 


-2.2 


Found by using Google 


54 


25.4 


92 


30.7 


+5.3 


Not found 


83 


39.0 


124 


41.3 


+2.3 


Could not determine 


5 


2.3 


5 


1.7 


-.6 


Total 


213 


100.0 


300 


100.0 





46 Casserly and Bird 



LRTS 52(1) 



Internet Archive changed very little from the original to the 
follow-up study. The largest difference was in the accessible 
content category. In the follow-up study the researchers 
found 64.3 percent of these URLs in the Internet Archive, 
almost 14 percent more than in the original study. 

The researchers were able to access 39, or 47.0 per- 
cent, of the 83 citations that they could not find at the URL 
cited or elsewhere on the Web in the original study by using 
the Wayback Machine. This raised the overall availability 
rate of the cited content from 81.4 percent to 89.2 percent, 
or 7.8 percent. In the follow-up study, they found 63, or 50.8 
percent, of the 124 citations in the "Content Not Found" 
category, raising the overall availability of cited content from 
68.0 percent to 80.6 percent, or 12.6 percent. 

Changing Categories 

The researchers ran cross-tabulations to determine the 
extent to which the citations in the sample changed catego- 
ries between the studies. Table 4 presents the status in the 
follow-up study of the URLs that were persistent, accessible, 
not found, or could not be determined in the original study. 
Of the 282 citations that were persistent in the original study, 
188 were found to be persistent in the follow-up study, and 
all except 37 were found elsewhere on the Web by either 
truncating the URL (1 citation), browsing and searching the 
site (19 citations), or using Google (37 citations). Four of the 
8 citations that were found by truncating the URL in the 
original study were found using this method in the follow-up 
study, one was found by browsing and searching the Web 



site, and another was found by using Google. Thirty- two of 
the 54 citations that were found by browsing and searching 
the Web site in the original study also were found using that 
method in the follow-up study, while 15 were found using 
Google, and 4 were not found. The researchers found 29 of 
the 54 citations for which they had to use Google in the origi- 
nal study by using Google in the follow-up study. They could 
not find the content cited by 14 of those 54 citations. 

The researchers expected that many of the citations in 
the sample would, over time, move from persistent to acces- 
sible to not found. However, 20 progressed in the opposite 
direction. These citations are identified with an asterisk in 
table 4. In 5 cases, URLs that were only accessible in the 
original study were found to be persistent in the follow-up 
study. This group includes 2 citations for which content was 
found in the original study by truncating the URL and using 
Google, and one whose content was found by browsing and 
searching the Web site. In addition, 2 of the 83 citations 
in the not found category in the original study were found 
to be persistent in the follow-up study, and 13 others were 
found to be accessible by truncating the URL (1 citation), 
browsing and searching the Web site (4 citations), or using 
Google (8 citations). 

The researchers examined the 20 citations that moved 
from accessible to persistent and from not found to per- 
sistent or accessible to try to identify patterns that would 
explain this improvement in availability. They found that 
when they searched for 12 of these citations in the original 
study, they initially received a "URL not available" mes- 
sage. This required that they wait a week and then search 



Table 3. Content availability: Internet archive in original and follow-up studies 



Accessible in Internet Archive 
Original study 

Found 

Not found 

Could not determine 

Total 

Follow-up study 

Found 

Not found 

Could not determine 

Total 



All citations 
No. % 



Persistent URLs 
No. % 



Accessible content 
No. % 



Content not found 
No. % 



344 
146 
10 
500 



340 
156 
4 
500 



29.2 
2.0 
100.0 



68.0 
31.2 
.8 

100.0 



239 
42 
1 

282 



166 
28 
1 

195 



14.9 
.4 
100.1 



85.1 
14.4 
.5 

100.0 



66 
60 
4 
130 



110 
61 


171 



50.8 
46.2 
3.0 
100.0 



64.3 
35.7 


100.0 



39 
44 


83 



63 
61 

124 



47.0 
53.0 
0.0 
100.0 



50.8 
49.2 


100.0 



52(1) LRTS 



Web Citation Availability 47 



the URL again. The researchers found the content cited by 
3 of these on the Web but not at the URL in the citation. 
In the follow-up study, their status changed from acces- 
sible to persistent because when the researchers looked 
for the content at the URL included in the citation, they 
were referred to a page that contained that content. The 
researchers also initially received a "URL not available" 
message in the original study for the remaining 9 of these 
12 citations, but when they waited a week and searched 
again they were not able to find the cited content any- 
where on the Web. In the follow-up study the researchers 
did find the content, but not at the URL included in the 
citation and not at a referred page. Therefore, the status of 
these cases changed from not found to accessible. Based 
on these findings, the improvement in availability appears 
to be the result of referrals and automatic redirects not 
in place during the original study. Improvements in Web 
site indexing capabilities with more sites offering tools 
to search their sites and enhancements to Google's Web 
search features also may have affected availability. Finally, 
in some cases, the improvement may be simply because 
the Web sites containing the cited content were not work- 
ing at the time the original study was conducted, but were 
functioning during the follow-up study. 

The cross-tabulation of cited content availability in 
the Internet Archive in the original and follow-up study is 



presented in table 5. Fifty of the 344 URLs found in the 
Internet Archive in the original study no longer led to the 
cited content when the follow-up study was conducted. The 
content for 46 of the 146 citations that were not found in 
the Archive in 2002 was found in the Archive during the 
follow-up study. 

Characteristics Associated with Availability 

The researchers ran a series of cross-tabulations to identify 
the characteristics of the cited URLs that could be asso- 
ciated with URL persistence and content availability on 
the Web and in the Internet Archive. Chi-Square Tests of 
Independence were performed to identify the statistically 
significant relationships. In order to run these tests, the 
researchers had to filter out some of the cases in the could 
not determine category and reclassify some of the variable 
values into broader categories. The results of the Chi-Square 
tests for the original and the follow-up studies are presented 
in table 6, and those that are significant at the p < 0.05 
level are identified with an asterisk. Cross-tabulations were 
not run on citation content and availability in the Internet 
Archive variables, as the Wayback Machine only accepts 
URLs and, therefore, the presence or absence of additional 
bibliographic information in the citation could not affect, or 
be associated with, availability in the archive. 



Table 4. Category changes between original and follow-up study: persistence and accessibility 



Disposition in the Follow-up Study 



Category 

Persistent — Found at 
URL cited 

Accessible — Found by 
truncating URL 

Accessible — Found by 
browsing and searching 
Web site 

Accessible — Found by 
correcting error in URL 

Accessible — Found by using 
Google 

Not found 

Could not determine 

Total 



Original 
study total 

282 



54 



54 

83 
10 
500 



Persistent- 
found at 
URL cited 

188 



1* 



2* 

2* 

195 



Accessible- 
Accessible- found by Accessible- Accessible- 



found by browsing found by 
truncating and search- correcting 
URL ing Web site error in URL 



1 



19 



32 



4* 


64 







found 
by using 
Google 

37 



15 



29 




92 



Not found 

37 




14 




124 



Could not 
determine 



o 

10 
10 



* moved from accessible to persistent or from not found to persistent or accessible. 



48 Casserly and Bird 



LRTS 52(1) 



In the original study, the Chi-Square tests indicated that 
citation content, domain, and URL directory depth were 
associated with content availability. Specifically, the amount 
of information in the citation, implied domain, and direc- 
tory depth were associated with content found at the URL 
cited or at a referred page; that is, persistence. Original and 
implied domain, as well as directory depth, were associ- 
ated with content that was either found at the URL cited 
or elsewhere on the Web; that is they were "persistent or 
accessible." Finally, original domain and directory depth 
were found to be associated with content availability in the 
Internet Archive. Although domain, directory depth, and 
citation content characteristics were found to be associated 
with content availability in the follow-up study, they are not 
associated with the same types of availability as in the origi- 
nal study. In addition, the source journal (print or e-journal) 
and page type (navigation or content) were found to be 
associated with availability in the follow-up study, although 
not in the original study. 



The cross-tabulations for the characteristics with signifi- 
cant Chi-Square values are presented in table 7. The cross- 
tabulation between source journal and persistence indicates 
that the URLs in the sample citations taken from print jour- 
nals were found to be persistent more often than the URLs 
in citations taken from journals that were published only in 
electronic format. Specifically, 41.5 percent of the citations 
in the sample that came from print journals were found 
to be persistent, while the persistence rate for URLs from 
electronic-only journals was only 20.5 percent. 

Page type and directory depth also were found to be 
associated with persistence. More than 52 percent of the 
content cited by navigation pages was found at the URL 
included in the citation, in comparison to 35 percent of the 
content cited by content pages. In general, persistence rates 
decreased as the directory depth increased. More than 62 
percent of the content was found at the URL cited when 
the URL was at the server level, and only 13.3 percent of the 
content was found at the URL cited when the URL was at 



Table 5. Category changes between original and follow-up study availability in Internet Archive 



Category 

Found in Archive 
Not found in Archive 
Could not determine 
Total 



Original study total 

344 
146 
10 
500 



Disposition in the Follow-up Study 
Found in Archive Not Found in Archive Could not determine 

294 50 

46 100 

6 4 

340 156 4 



Table 6. Summary of Pearson's Chi-Square (x 2 ) values: citation characteristics and content availability in original and follow-up 
studies 



Persistent Persistent or accessible Archived 

Original study Follow-up study Original study Follow-up study Original study Follow-up study 



Characteristics 


df 


X 2 


P 


df 


X 2 


P 


df 


X 2 


P 


df 


X 2 


P 


df 


X 2 


P 


df 


X 2 


P 


Source journal 


l 


.559 


.455 


1 


6.576 


.010* 


l 


.073 


.787 


1 


2.207 


.137 


l 


.754 


.385 


1 


.070 


.792 


Content 


2 


10.050 


.007* 


2 


4.665 


.097 


2 


1.123 


.570 


2 


6.544 


.038* 




DNA 






DNA 




Date viewed 


1 


2.952 


.086 


1 


1.152 


.283 


1 


.082 


.775 


1 


.013 


.909 


l 


.967 


.326 


1 


.348 


.555 


Original domain 


5 


10.780 


.056 


4 


7.103 


.131 


5 


11.910 


.036* 


4 


19.478 


.001* 


5 


11.524 


.042* 


4 


10.364 


.035* 


Implied domain 


5 


18.784 


.002* 


4 


8.133 


.087 


5 


21.821 


.001* 


4 


23.613 


.000* 


5 


8.165 


.147 


4 


14.245 


.007* 


Directory depth 


5 


14.165 


.015* 


5 


27.190 


.000* 


5 


12.738 


.026* 


5 


7.226 


.204 


5 


11.572 


.041* 


5 


10.646 


.059 


Page type 


1 


2.879 


.090 


1 


12.101 


.000* 


1 


.000 


.992 


1 


.359 


.549 


1 


1.334 


.248 


1 


5.455 


.020* 


Tilde (~) included 


1 


1.832 


.176 


1 


.199 


.656 


1 


.334 


.563 


1 


.864 


.353 


1 


1.237 


.266 


1 


.756 


.384 



* significant at the p< 0.05 level 



52(1) LRTS 



Web Citation Availability 49 



Table 7. Cross-tabulations: citation characteristics and content availability in the follow-up study 



Persistent 



Characteristic 

Source journal (N = 490) 

Print journal 
E-journal only 

Directory depth (N = 490) 


1 

2 
3 
4 

5 or more 

Page type (N = 490) 

Navigation page 
Content page 



No. 



187 



46 
25 
49 
39 
32 
4 

71 
124 



% 

41.5 
20.5 

62.2 
40.3 
36.0 

33.9 
43.8 
13.3 

52.2 
35.0 



Not persistent 
No. % 



Total 



264 
31 

28 
37 
87 
76 
41 
26 

65 
230 



58.5 
79.5 

37.8 
59.7 
64.0 
66.1 
56.2 
86.7 

47.8 
65.0 



No. 

451 

39 

74 
62 
136 
115 
73 
30 

136 
354 



% 

100.0 
100.0 

100.0 
100.0 
100.0 
100.0 
100.0 
100.0 

100.0 
100.0 



Available on the Web 
No. % 



Not available on the Web 
No. % 



Total 



Citation content (N = 490) 

URL only 

URL & partial bibl. info. 
URL & complete bibl. info. 

Original domain (/V = 490) 

Commercial 
Education 
Government 
Organization 

Geographic designation and other 

Implied domain (N = 490) 

Commercial 
Education 
Government 
Organization 

Geographic designation and other 



18 
181 
167 

56 
77 
37 
89 
107 

64 
139 
45 
97 
21 



62.1 
71.8 
79.9 

58.9 
81.9 

82.2 
81.7 
72.8 

58.7 
82.2 
75.0 
80.8 
65.6 



11 

71 
42 

39 
17 
8 
20 
40 

45 
30 
15 
23 
11 



37.9 
28.2 
20.1 

41.1 
18.1 
17.8 
18.3 
27.2 

41.3 
17.8 
25.0 
19.2 
34.4 



No. 

29 
252 
209 

95 
94 
45 
109 
147 

109 
169 
60 
120 

32 



100.0 
100.0 
100.0 

100.0 
100.0 
100.0 
100.0 
100.0 

100.0 
100.0 
100.0 
100.0 
100.0 



Available in the 
Internet Archive 



Not available in the 
Internet Archive 



Total 



Original domain (N = 496) 

Commercial 
Education 
Government 
Organization 

Geographic designation and other 

Implied domain (N = 496) 

Commercial 
Education 
Government 
Organization 

Geographic designation and other 

Page type (N = 496) 

Navigation page 
Content page 



No. 

53 
72 
31 
76 
108 

62 
130 
42 
84 

22 

104 

236 



% 

55.8 
76.6 
68.9 
69.7 
70.6 

56.4 
76.9 
67.7 
70.6 
61.1 

76.5 
65.6 



No. 

42 
22 
14 
33 
45 

48 
39 
20 
35 
14 

32 
124 



% 

44.2 
23.4 
31.1 
30.3 
29.4 

43.6 
23.1 
32.3 
29.4 
38.9 

23.5 
34.4 



No. 

95 
94 
45 
109 
153 

110 

169 
62 
119 

36 

136 
360 



100.0 
100.0 
100.0 
100.0 
100.0 

100.0 
100.0 
100.0 
100.0 
100.0 

100.0 
100.0 



50 Casserly and Bird 



LRTS 52(1) 



level five or lower. However, for URLs at level four, the per- 
sistence rate (43.8 percent) was higher than that for citations 
containing URLs at level one (40.3 percent). This suggests 
that some other variable may be affecting the relationship 
between directory depth and persistence. This pattern also 
was observed in the original study, where directory level was 
found to be associated with availability on the Web and in 
the Internet Archive, but where page type was not found to 
be associated with availability in either place. 

Citations containing complete bibliographic informa- 
tion along with the URL were more likely to lead to available 
content than those with only partial bibliographic informa- 
tion or those that contained only URLs. Almost 80 percent 
of the citations with complete bibliographic information led 
to content that was either at the URL cited or accessible 
elsewhere on the Web. When the citation included only 
partial bibliographic information, this percentage dropped 
to 71.8 percent; for citations that included only URLs, the 
availability rate was only 62.1 percent. 

The cross-tabulations of domains with content avail- 
ability indicated that content cited by URLs with original 
domains of .gov, .edu, and .org were more likely to be avail- 
able and found in the Internet Archive than content cited by 
URLs with .com or other types of domains. Between 81.7 
and 82.2 percent of the content cited by URLs residing on 
education, government, and organization servers was found 
either at the URL included in the citation or elsewhere on 
the Web. Less than 60 percent of the content cited by URLs 
residing on commercial servers was found to be available. 
For content found in the Internet Archive, the pattern was 
similar. The researchers found 76.6 percent of the content 
cited by URLs residing on education servers and only 
55.8 percent of the content cited by URLs on commercial 
servers. URLs in the combined category include those on 
military and network servers and those with geographic 
designations. Content availability in the Internet Archive 
for citations with URLs in this combined category was 70.6 
percent, which is higher than the percentages of available 
content for citations containing government and organiza- 
tion URLs. 

The cross-tabulations indicated that content availability 
on the Web and in the Internet Archive also was associated 
with implied domain. The original domains in the sample 
citations' URLs were translated into implied domains by 
folding the country code top-level domains that included 
information about the organization hosting their content into 
the appropriate generic top-level domains. This reduced the 
number of sample citations in the "Geographic Designation 
and Other" category and increased the number in the other 
four domain categories. The citation URLs with implied 
education domains were found to have the most available 
content (82.2 percent), and those with commercial implied 
domains had the least available content (58.7 percent). 



Three-quarters of the content cited by URLs with govern- 
ment implied domains and 80.8 percent of content cited by 
those with organization implied domains were found to be 
available. This pattern of content availability also was found 
in the Internet Archive. The content cited by URLs in the 
education implied domain was found to be most accessible 
(76.9 percent), followed by URL citations with organiza- 
tion (70.6 percent) and government (67.7 percent) implied 
domains. 

The Chi-Square test also indicated that page type 
was found to be associated with availability in the Internet 
Archive. More than three-quarters of the navigation pages 
(76.5 percent) were found in the Internet Archive, while 
65.6 percent of the content pages were found there. 

Instructions for Authors 

In the original study, the researchers reviewed the instruc- 
tions for authors published by the journals from which the 
sample was drawn for the period of the study (1999-2000), 
and again as that manuscript was being prepared, in order 
to determine if these journals had established policies or 
instructions on citing content on the Web. In the original 
study, only 6 of the 34 journals included examples of cita- 
tions to electronic resources for authors to follow. Three 
of these also provided further instructions on citing Web 
resources. One additional title referred authors citing con- 
tent on the Web to the American Psychological Association's 
Web site (APAStyle.org). Fifteen of the journals referred 
authors to the fourteenth edition of The Chicago Manual 
of Style, which, having been published in 1993, does not 
address references to digital resources. None of the instruc- 
tions for authors addressed Web site persistence. 

By the time the follow-up study was conducted, 2 of 
the 34 journals had ceased publication, and 2 had merged. 
Of the 31 remaining journals, 12 included examples of 
citations to electronic resources in their instructions to the 
authors. Only 5 of the journals instructed authors to use the 
fourteenth edition of The Chicago Manual of Style, whereas 
6 referred authors to the fifteenth edition, which includes 
instructions for citing content on the Web. Three of the 31 
journals included some mention of Web site instability, with 
1 journal requiring authors to include digital object identi- 
fiers in citations to journal articles on the Web. 

Summary and Conclusion 

Persistence, as measured by the number of URLs in the 
sample citations that led directly or through a referred 
page to the cited content, degraded by 17.4 percent in the 
three years between the original and the follow-up studies. 
During the follow-up study, when searching for content not 



52(1) LRTS 



Web Citation Availability 51 



found at the URL cited, the researchers had relatively less 
success searching the cited Web site and more success using 
Google, suggesting that the cited content was less closely 
associated with the Web site on which it resided at the time 
of the original study. Overall, cited content not found either 
at the URL included in the citation or elsewhere on the Web 
increased from 16.6 percent in the original study to 24.8 per- 
cent in the follow-up. Some cited content in this not-found 
category was accessible in the Internet Archive. When the 
content found only in the Internet Archive is factored in, the 
percentage of content available either on the Web or in the 
Internet Archive became 89.2 percent in the original study, 
but dropped to 80.6 percent in the follow-up study. 

The researchers expected availability to degrade from 
persistent to accessible to not found over time, given the 
dynamism of the Web. However, some content moved 
in the opposite direction; that is, it was not persistent or 
accessible in the original study, but was found to be so 
in the follow-up study. These instances provide further 
evidence of the volatility of content published on the Web 
and substantiate the "but it was [or wasn't] there yesterday" 
experience. The results of searching the sample URLs in 
the Internet Archive also provide evidence of this volatil- 
ity. Contrary to the permanence and stability suggested by 
the term "archive," this study demonstrates that content 
appears in and disappears from the Internet Archives. 
Fifty, or 14.5 percent, of the 344 URLs found in this 
archive in the original study were not found in the Internet 
Archive when the follow-up study was conducted, while 
46, or 31.5 percent, of those not found in the Internet 
Archive in the original study, were in that archive during the 
follow-up study. 

Two characteristics, citation content and implied 
domain, were associated with persistence in the original 
study, but not the follow-up. The only citation characteristic 
that remained associated with persistence through both 
studies was directory depth. In general, citations with URLs 
at the server level were the most likely to be persistent; as 
the directory depth increased, persistence decreased. An 
anomaly at the fourth level also appeared in the original 
study and may suggest that the relationship between direc- 
tory depth and persistence is not linear. 

Page type and source journal were found to be associ- 
ated with persistence in the follow-up study, but not in the 
original. The findings that page type, which is derived from 
directory depth, is associated with persistence and that 
URLs pointing to navigation pages are more likely to be 
persistent than those pointing to content pages are logical. 
In contrast, the finding that citations from print journals are 
more often persistent than those from journals published 
only in electronic format is not easily interpreted. This find- 
ing suggests that article authors in electronic-only journals 
chose, or needed, to cite content that was less persistent 



than the content cited by authors who published in print 
journals. The researchers do not believe that, in terms of 
policy value, this is a useful finding. 

The characteristics that were associated with availabil- 
ity — that is, with persistence or accessibility elsewhere on 
the Web — in the follow-up study were citation content, 
original domain, and implied domain. Directory depth was 
associated with availability in the original study, but not in 
the follow-up. Citation content was found to be inversely 
associated with persistence in the original study. In report- 
ing the results of that study, die researchers hypothesized 
that the inverse relationship between the amount of infor- 
mation in the citation and persistence was a consequence of 
the study methodology: 

When searching the Web for the content cited 
by "URL only" or "URL and partial bibliographic 
information" citations, the researchers, having little 
or no bibliographic information to provide evidence 
to the contrary, may have tended to accept the Web 
page that was retrieved as containing the content 
the author cited. In contrast, when they were 
working with "URL and complete bibliographic 
information" citations the researchers were able to 
determine with certainty whether or not they had 
found die cited content. 30 

The researchers employed this same methodology in 
the follow-up study, but found a direct relationship between 
the citation's completeness of information and cited content 
availability on the Web. Although this finding is more logi- 
cal than the inverse relationship found in the original study, 
the methodological limitation remains; as a consequence, 
persistence rates may be overstated in this study as well as in 
the original. Future researchers could compensate for this 
limitation by consulting the source text to determine if they 
have found the cited content when searching incomplete 
citations or by limiting their samples to citations that include 
both URLs and complete bibliographic information. 

The follow-up study underscores the association 
between domain, especially implied domain, and the likeli- 
hood that cited content will be found by future researchers. 
Original and implied domain were associated with persis- 
tence or accessibility in both studies and with availability in 
the Internet Archive in the follow-up study. Content cited 
on education, government, and organization servers were 
more likely to be found at the URL included in the citation 
or elsewhere on die Web, and in the Internet Archive, than 
content on commercial or other types of servers. 

The follow-up study indicates that journal editors are 
providing more instruction to their audiors regarding citing 
content on the Web. Instructions to authors are more likely 
to include examples of citations to Web sites and referrals to 



52 Casserly and Bird 



LRTS 52(1) 



style guides that prescribe citations with full bibliographic 
information, URLs, and dates viewed. 

This study provides further documentation of the 
decline in citation persistence to content on the Web over 
time. Beyond that, by employing statistical tests of signifi- 
cation to citation characteristics related to persistence and 
availability, it provides strong support for the findings of 
previous research that were based on descriptive statisti- 
cal analyses. This study supports the various findings by 
Koehler, Dimotrova and Bugeja, and Wren and colleagues 
of higher persistence rates of URLs at, or near, a Web 
site's directory level. 31 It also supports a number of studies, 
including those by Bugeja and Dimotrova, Bar-Ilan and 
Paritz, Ramsey, and Tan, Foo and Hui, that found URLs in 
the education domain to be the more persistent or stable 
than URLs in other domains. 32 However, other studies have 
found .gov sites to be the most stable, and more research 
is needed to sort out the relationship between domain and 
persistence. 33 

The original study and this follow-up are unlike other 
studies of persistence in that the researchers looked for the 
content cited by their sample's URLs in the Internet Archive. 
The studies' findings provide strong evidence that the 
Internet Archives is not a reliable source for cited content 
that is no longer available on the Web and underscore the 
need for both technical solutions and peer policies to address 
the problems associated with URL persistence. 34 The follow- 
up study confirms the importance of many of the recom- 
mendations made by the researchers based on the findings of 
the original study. Specifically, the professions and academic 
disciplines need to develop new citation conventions; journal 
publishers and editors need to better instruct authors about 
citing content and enforce new conventions as they are 
established; and authors, editorial staff, and publishers need 
to work together to develop and implement technologies to 
ensure that cited content is preserved and remains accessible 
to researchers and students over the long term. 

References 

1. Mary F. Casserly and James E. Bird, "Web Citation Availability: 
Analysis and Implications for Scholarship," College ir Research 
Libraries 64, no. 4 (July 2003): 300-17. 

2. Ibid., 300-302. 

3. Carmine Sellitto, "The Impact of Impermanent Web-located 
Citations: A Study of 123 Scholarly Conference Publications," 
Journal of the American Society for Information Science and 
Technology 56, no. 7 (May 2005): 695-703; Carmine Sellitto, 
"A Study of Missing Web-Cites in Scholarly Articles: Towards 
an Evaluation Framework," Jou rnal of Information Science 30 
no. 6 (2004): 484-95. 

4. Jonathan D. Wren et al., "Uniform Resource Locator Decay 
in Dermatology Journals: Author Attitudes and Preservation 
Practices," Archives of Dermatology 142, no. 9 (Sept. 2006): 
1147-52. 



5. Robert P. Dellavalle et al., "Going, Going, Gone: Lost Internet 
References," Science 302, no. 5646 (Oct. 31, 2003): 787-88. 

6. Diomidis Spinellis, "The Decay and Failures of Web 
References," Communications of the ACM 46, no. 1 (Jan. 
2003): 71-77. 

7. Daniela V. Dimitrova and Michael Bugeja, "Consider 
the Source: Predictors of Online Citation Permanence in 
Communication Journals," portal: Libraries and the Academy 
6, no. 3 (2006): 269-83. 

8. Renee Crichlow, Stefanie Davis, and Nicole Winbush, 
"Accessibility and Accuracy of Web Page References in 5 
Major Medical Journals," Journal of the American Medical 
Association 292, no. 22 (Dec. 8, 2004): 2723-24. 

9. John Markwell and David W. Brooks, "'Link Rot' Limits 
the Usefulness of Web-Based Educational Materials in 
Biochemistry and Molecular Biology," Biochemistry and 
Molecular Biology Education 31, no.l (Jan. 2003): 69-72. 

10. Jose Luis Ortega, Isidro Aguillo, and Jose Antonio Prieto, 
"Longitudinal Study of Content and Elements in the Scientific 
Web Environment," Journal of Inforntatum Science 32, no. 4 
(2006): 344-51. 

11. Michael Bugeja and Daniela V. Dimitrova, "Exploring the Half- 
Life of Internet Footnotes," Iowa Journal of Communication 
37, no. 1 (Spring 2005): 77-86. 

12. Judit Bar-Ilan and Bluma C. Peritz, "Evolution, Continuity, 
and Disappearance of Documents on a Specific Topic on the 
Web: A Longitudinal Study of Tnformetrics,'" Journal of the 
American Society for Information Science and Technology 55, 
no. 11 (Sept. 2004): 980-90. 

13. Wallace Koehler, "A Longitudinal Study of Web Pages 
Continued: A Consideration of Document Persistence," 
Information Research 9 no. 2 (Jan. 2004), http://informationr 
.net/ir/9-2/paperl74.html (accessed June 6, 2007) 

14. Jeffrey D. Kushkowski, "Web Citation by Graduate Students: 
A Comparison of Print and Electronic Theses," portal: 
Libraries and the Academy 5, no. 2 (2005): 259-76. 

15. Michelle M. Wu, "Why Print and Electronic Resources are 
Essential to the Academic Law Library." Law Library Journal 
97, no. 2 (Spring 2005): 233-56, www.aallnet.org/products/ 
pubjlj_v97no2_2005-14.pdf (accessed June 6, 2007). 

16. Wendy Smith, "Still Lost in Cyberspace? Preservation 
Challenges of Australian Internet Resources," Australian 
Library Journal 54, no. 3 (2005), http://alia.org.au/publishing/ 
alj/54.3/full.text/smith.html (accessed June 6, 2007). 

17. Koehler, "A Longitudinal Study of Web Pages Continued." 

18. Spinellis, "The Decay and Failures of Web References." 

19. Wren et al, "Uniform Resource Locator Decay in Dermatology 
Journals." 

20. Dimitrova and Bugeja, "Consider the Source." 

2 1 . Wren et al, "Uniform Resource Locator Decay in Dermatology 
Journals." 

22. Sellitto, "The Impact of Impermanent Web-located 
Citations." 

23. Markwell and Brooks, "'Link Rot' Limits the Usefulness 
of Web-Based Education Materials in Biochemistry and 
Molecular Biology." 

24. Dellavalle et al., "Going, Going, Gone." 



52(1) LRTS 



Web Citation Availability 53 



25. Bugeja and Dimitrova, "Exploring the Half-Life of Internet 
Footnotes." 

26. Dimitrova and Bugeja, "Consider the Source." 

27. Bar-Ilan and Peritz, "Evolution, Continuity, and Disappearance 
of Documents of a Specific Topic on the Web.'" 

28. Lisa M. Schilling et al, "Digital Information Archiving Policies 
in High-Impact Medical and Scientific Periodicals," Journal 
of the American Medical Association 292, no. 22 (Dec. 8, 
2004): 2724-26. 

29. Casserly and Bird, "Web Citation Availability." 

30. Ibid., 315. 

31. Koehler, "A Longitudinal Study of Web Pages Continued"; 
Dimitrova and Bugeja, "Consider the Source"; Wren et 
al, "Uniform Resource Locator Decay in Dermatology 
Journals." 



32. Bugeja and Dimitrova, "Exploring the Half-Life of Internet 
Footnotes"; Bar-Ilan and Peritz, "Evolution, Continuity, and 
Disappearance of Documents of a Specific Topic on the Web"; 
Mary Rumsey, "Runaway Train: Problems of Permanence, 
Accessibility, and Stability in the Use of Web Sources in Law 
Review Citations," Law Library Journal 94, no. 1 (Winter 
2002): 27-39; Bing Tan, Schubert Foo, and Siu Cheung Hui, 
"Web Information Monitoring: An Analysis of Web Page 
Updates," Online Information Review 25, no. 1 (2001): 6-19. 

33. Dimitrova and Bugeja, "Consider the Source"; John Markwell 
and David W. Brooks, "Broken Links: The Ephemeral 
Nature of Educational WWW Hyperlinks," Journal of Science 
Education and Technology 11, no. 2 (June 2002): 105-08. 

34. Steve Lawrence et al., "Persistence of Web References in 
Scientific Research," Computer 34, no. 2 (Feb. 2001): 26-31. 



Index to Advertisers 



Archival Products 280 

EBSCO 3 

Library of Congress cover 2 

Library Technologies 72 

National Research Council, Canada cover 3 



54 



LRTS 52(1) 



Jim LeBlanc (JDL8@cornell.edu) is Head, 
Database Management Services, and 
Martin Kurth (MK168@cornell.edu) is 
Director, Discovery Systems and Services, 
both at Cornell University Library, Ithaca, 
New York. 



Submitted February 28, 2007; returned 
to authors April 14, 2007 for revision and 
second review; resubmitted July 7, 2007 
and accepted for publication. 



An Operational Model 
for Library Metadata 
Maintenance 

By Jim LeBlanc and Martin Kurth 

Libraries pay considerable attention to the creation, preservation, and transfor- 
mation of descriptive metadata in both MARC and non-MARC formats. Little 
evidence suggests that they devote as much time, energy, and financial resources 
to the ongoing maintenance of non-MARC metadata, especially with regard to 
updating and editing existing descriptive content, as they do to maintenance of 
such information in the MARC-based online public access catalog. In this paper, 
the authors introduce a model, derived loosely from J. A. Zachman's framework 
for information systems architecture, with which libraries can identify and inven- 
tory components of catalog or metadata maintenance and plan interdepartmental, 
even interinstitutional, workflows. The model draws on the notion that the exper- 
tise and skills that have long been the hallmark for the maintenance of libraries' 
catalog data can and should be parlayed towards metadata maintenance in a 
broader set of information delivery systems. 

Librarians know how to maintain catalog data. Since the days of using industrial- 
grade erasers to correct and update information on catalog cards, they have 
made maintaining catalogs an important part of their business to ensure that the 
contents of the surrogate bibliographic records they present to users are complete 
and accurate. In spite of this history of catalog maintenance, librarians have not 
yet given the same kind of focus to the catalog data in newer resource discovery 
systems — that is, to information in databases other than the online public access 
catalog (OPAC) and in metadata formats other than MARC. This lack of attention 
to the integrity of these new catalogs is not necessarily intentional. Those respon- 
sible for the general upkeep of digital collections and the bibliographic metadata 
associated with these aggregates are often distributed throughout the library or 
even across multiple libraries, and they are not always the practitioners of tradi- 
tional library technical services. These keepers of non-MARC metadata are as 
likely to be found in library systems offices or in metadata services departments 
as in cataloging, catalog maintenance, or database management units. 

In a 2004 article on the redesign of database management (DBM) at Rutgers 
University Libraries, Bogan recounted the use of a core competency model to help 
identify those aptitudes and skills that most characterize traditional DBM staff 1 
She maintained that understanding these qualities and the values that underpin 
them allows technical services managers to reposition DBM staff to make useful 
contributions to the maintenance of a library's digital collections and the catalogs 
that describe them, and to work in the broader bibliographic infosphere for which 
libraries now create and maintain data in multiple resource discovery systems. At 
Rutgers, the DBM team defined its core competency and its role in the library 
as "fast and accurate maintenance and conversion of bibliographic and related 
metadata to support Rutgers University Libraries' resources." 2 Further, Bogan 
noted that: 



52(1) LRTS 



An Operational Model for Library Metadata Maintenance 55 



The expertise of DBM will be increasingly valu- 
able as metadata proliferates across an increasing 
number of repositories. DBM's knowledge-build- 
ing activities — shared problem solving, implemen- 
tation of new processes, experimentation, and 
importing knowledge — strengthen the unit's ability 
to respond quickly to emerging opportunities. Such 
opportunities are not likely to be radical shifts 
requiring rebuilding of skills from the ground up; 
rather, they will be logical extensions of the exper- 
tise embodied in the unit. 3 

If Bogan is right, and DBM staff are poised to be 
redeployed as keepers of libraries' metadata in multiple 
formats and multiple resource discovery systems, per- 
haps in association with staff in metadata, cataloging, and 
systems units, how might libraries go about identifying 
catalog maintenance needs and priorities in this expanded 
DBM sphere? In the following pages, the authors pro- 
pose an operational model with which to help answer this 
question. 

Catalog Maintenance versus 
Metadata Maintenance 

In a short piece published in 1986 in the RTSD Newsletter, 
Reid and Fiste outlined the new challenges for technical 
services managers in maintaining library catalogs in an 
online environment. 4 For the purposes of their argument, 
they defined catalog maintenance as "the total work involved 
in maintaining a card file of bibliographic records for public 
use — including addition, correction, and deletion of records, 
as well as production of a syndetic structure connecting 
individual records." 3 According to Reid and Fiste, database 
maintenance, on the other hand, is "the comparable work 
done in maintaining a computerized file of bibliographic 
records," but which includes an expanded, more complex 
set of tasks: 

Sample database maintenance projects include 
elimination of duplicate records; deletion of alter- 
nate call numbers not used locally; addition of 
alternative title access for titles with abbreviations, 
initialisms, special characters, symbols, and num- 
bers; checking of filing indicators; removal of initial 
articles in fields lacking filing indicators; updat- 
ing fixed field data (e.g. imprint dates) previously 
ignored for card production; verification of hold- 
ings for given collections; determining if cancelled 
records were dropped during the tape load; and 
creation of authority files for verification purposes 
and cross-reference generation. 6 



Reid and Fiste concluded that the transition from a 
card to an online environment required a re-examination of 
catalog maintenance procedures and workflows, as well as 
new methods for compiling statistics and other management 
information. Though Reid and Fiste anticipated procedural 
changes, they did not talk directly about the impact of this 
evolution on DBM staff. As is now clear, although this tran- 
sition did require additional training for some DBM practi- 
tioners, the core competencies of these personnel were, for 
the most part, adequate to the task. In other words, the work 
of maintaining catalog data was not normally reallocated to 
other library staff just because of the change in medium for 
delivering the data. 

The twenty-first-century shift to metadata mainte- 
nance is no less complex and potentially disruptive, for 
with the expansion of standard library metadata formats to 
include non-MARC metadata, metadata experts (includ- 
ing traditional DBM practitioners) must contend with new 
variables — including converting data from one metadata 
scheme to another, establishing and maintaining seman- 
tic equivalents between metadata elements and values in 
different schemes, and exposing metadata for harvesting. 
Westbrooks commented on this complexity in the broader 
context of metadata management: 

Metadata management is the sum of activities 
designed to create, preserve, describe, maintain 
access to, and manipulate metadata, MARC and 
otherwise, that may be owned, aggregated, or 
distributed by the managing institution. These 
organizational and intellectual activities require 
the physical resources (web services, scripts and 
cross-walks), financial commitment (much like that 
already invested into OPACs), and policy planning 
that codifies the guiding framework within which 
metadata exists.' 

While libraries do pay considerable attention to the cre- 
ation, preservation, and transformation of descriptive meta- 
data, little evidence exists that they devote as much time, 
energy, and financial resources to the ongoing maintenance 
of non-MARC metadata (especially with regard to updating 
and editing existing descriptive content) as they do to main- 
tenance of such information in the MARC-based OPAC. 

From a historical perspective, the number of main- 
tenance functions associated with ensuring the ongoing 
integrity of library catalogs has been increasing as the world 
of the library catalog has evolved. As a first step in model- 
ing metadata maintenance operations, the authors offer the 
following preliminary lists of typical maintenance functions 
for catalog records. These lists are not meant to be authori- 
tatively defined taxonomies, but to help illustrate how an 
operational model for metadata maintenance would work. 



56 LeBlanc and Kurth 



LRTS 52(1) 



In card catalogs, the range of maintenance tasks was 
relatively small: 

• Accrual — Filing new catalog cards. 

• Deletion — Removing existing cards. 

• Modification — Manually revising information on the 
cards (or producing revised versions of the cards). 

• Reporting — Compiling information regarding the 
cards. 

• Export — Photocopying the cards for printed catalogs 
(such as NUC). 

With the development of online catalogs, the number of 
data maintenance functions in which libraries could and did 
engage increased somewhat, as machine-readable records, 
unlike cards, could be moved around in cyberspace. They 
also could be turned on and off within the resource discov- 
ery system in which they were stored in order to make them 
available or unavailable to the public or other user groups. 
Thus, the scope of general maintenance tasks in an online 
environment widened to include the following: 

• Accrual — Adding new records. 

• Deletion — Removing existing records. 

• Modification — Revising data within records. 

• Reporting — Generating information regarding 
records. 

• Export — Copying selected records for other uses. 

• Migration — Transferring records from one integrated 
library system (ILS) to another. 

• Activation/deactivation — Making records available or 
unavailable to selected user groups. 

With scripting, construction of cross-walks, and other 
services to which Westbrooks referred, the number of main- 
tenance functions required to keep surrogate information in 
both MARC and non-MARC metadata catalogs clean and 
accurate has increased still further. The ten most obvious of 
these maintenance functions are: 

• Accrual — Adding new records. 

• Deletion — Removing existing records. 

• Modification — Revising data within records. 

• Transformation — Converting data from one metadata 
scheme to another. 

• Reporting — Generating information regarding 
records. 

• Export — Copying selected records for other uses. 

• Mapping — Establishing semantic equivalents 
between metadata elements or values in different 
schemes. 

• Migration — Transferring records from one system 
architecture to another. 



• Exposure — Making records available for harvesting. 

• Activation/deactivation — Making records available or 
unavailable to selected user groups. 

In metadata maintenance, all of these functions may 
conceivably come into play as the nature and content of 
digital objects or collections change. Librarians and library 
programmers already know how to perform this work for 
given targets, though how the various practitioners of this 
work must interact in the broader sphere of interrelated 
objects and collections is not always clear. The elements and 
values that represent these interactive relationships must be 
identified, defined, and codified in order to ensure the effi- 
cient functioning of the information system and the ongoing 
accuracy and integrity of the system's data. 

The Model 

Although somewhat unknown within the library world, J. A. 
Zachman's descriptive framework for information systems 
architecture (ISA) has been widely adopted by systems ana- 
lysts and database designers for use in businesses and insti- 
tutions in which technology and effort are distributed. 8 The 
ISA or Zachman framework examines entities and relation- 
ships within a given system in terms of six generic interroga- 
tives: what, where, who, when, why, and how. Underlying 
this description is an understanding that individual pieces 
of the overall framework must be tailored to specific stake- 
holder perspectives. Zachman uses an architectural example 
to illustrate how the values inherent in the owner's, the 
designer's, and the builder's points of view may differ with 
regard to a structure. Elements of the Zachman framework 
may thus vary in nature, terminology, and level of detail, 
depending on the stakeholders at whom the particular ele- 
ments are aimed. By identifying and associating elements in 
the system in this way, Zachman is able to construct a mul- 
tidimensional description of interrelationships among work 
teams and the tasks or products (or both) they deliver. 

Although the original ISA framework, and its later 
iterations, were intended as a basis on which to construct 
computer systems (that is, platform and software) architec- 
ture, Zachman's model also can be used to design workflows, 
both automated and manual, and to define variables and 
processes that can be used to inform strategic thinking, 
including planning and decision-making. 9 What follows rep- 
resents the adoption of a single point of view from the ISA 
framework, one that is roughly equivalent to a combination 
of what Zachman would call the designer's and builder's 
views. The resulting model allows for the examination of 
metadata maintenance workflows in a distributed environ- 
ment. Properly speaking, the structure laid out below is far 
enough removed from the original aims and definitions of 



52(1) LRTS 



An Operational Model for Library Metadata Maintenance 57 



the ISA framework that one should probably not refer to 
it as such at all, but rather as a Zachman-type or Zachman- 
inspired model. 

The hexagonal diagram in figure 1 represents a sim- 
plified view of how metadata maintenance work can be 
seen in terms of Zachman's interrogatives. The six rect- 
angular boxes depict the way in which these attributes 
can be understood as applied to metadata maintenance 
for a MARC or non-MARC catalog. In addition to the 
maintenance function itself, these attributes include those 
questions that must be answered in relation to each main- 
tenance function: periodicity , or the frequency at which 
administrators should perform the function; policy, or the 
institutional decision or guidelines for performing the func- 
tion; documentation, scripts, and services that describe the 
manual workflows and implement the automated process- 
es that execute the function; the administrative department 
responsible for performing the function; and contact, the 
individual or group designated to receive communications 
regarding the function. Although department and contact 
may be redundant for maintenance functions carried out 
in a small operation, these entities may differ in a larger, 
more distributed environment. For example, in the latter 
case, the contact may be a collection's administrator, while 
the department may be the work unit where the metadata 
maintenance is actually performed. The linearly defined 
facets of the diagram reveal the interrelationships among 
attributes for each maintenance function. The real-world 
context for this description is much more complicated, and 
one must imagine ten levels (representing the ten meta- 
data maintenance functions proposed above), with interre- 
lational linkage in three dimensions among the individual 
boxes and hexagons at all levels, to visualize the complete 
framework on which data maintenance for a given collec- 
tion should ideally be managed. 




Documentation, Scripts, 
Services 
how 



Can this model be used to describe catalog, database, 
and metadata maintenance across the ages, and thus show 
the increased sophistication and demands of maintain- 
ing library information over time? Can it thereby support 
Bogan's contention that the traditional core competency of 
DBM staff makes these individuals prime candidates for 
metadata maintenance assignments? A few examples are 
in order. 

In a catalog card environment, accrual is defined as 
the filing of new catalog cards. Placed in the context of the 
hexagonal model outlined above, an operational scenario 
might look something similar to figure 2. The diagram in 
figure 2 depicts accrual in terms of the six Zachman inter- 
rogatives: what, when, who, why, where, and how. In this 
hypothetical workflow (which mirrors typical workflows 
for card filing that veterans of library technical services 
may remember from the days before OPACs), cards are 
received and filed weekly. The cards, produced by catalog- 
ed or in batch by a vendor, are sent to a contact person or 
address, from which they are distributed to the staff who 
will ultimately file them. Institutional policy supports and 
provides guidelines for this activity, and the catalog main- 
tenance department will perform the task, according to 
local and national instructions (for example, the ALA filing 
rules). The institutional policy node in the diagram may 
seem a bit vague, but it is key to the operational model: 
whether in writing or simply assumed, the institution has 
made a decision to create cards (or have them created) 
and to file them according to a particular filing system. 
This operational element may be so obvious it seems not 
worth mentioning, but if one imagines the point at which 
the library decides to close or freeze the card catalog (such 
as at the point it decides to move to an online catalog), 
institutional support for this catalog maintenance function 
is withdrawn and the workflow becomes obsolete. The 
model also allows for a decision to take place even before 
a maintenance function is first implemented. 



Contact: 
Person/address to whom 
new cards are delivered 



Source: This diagram is adapted from the hexagonal model found in J. F. 
Sowa and J. A. Zachman, "Extending and Formalizing the Framework for 
Information Systems Architecture," IBM Systems Journal 31 , no. 3 (1992): 611 . 



Periodicity: 
Weekly 




Department 
Catalog Maintenance 



Policy: 
Institutional decision/ 
guidelines 



Documentation, etc.: 
Instructions for filing 
cards 



Figure 1. The operational model for library metadata mainte- 
nance 



Figure 2. An application of the model to the accrual function in 
a catalog maintenance context 



58 LeBlanc and Kurth 



LRTS 52(1) 



In the online environment, or from the point of view 
of what Reid and Fiste term database maintenance, dele- 
tion is defined as the removal of existing records from the 
online system. Using the hexagonal model, figure 3 depicts 
a hypothetical operational approach that addresses this 
maintenance function. As in the previous diagram, figure 
3 outlines the database maintenance functions in terms of 
the six interrogative s. In this scenario, requests to delete 
records from the system are delivered as needed to a contact 
person or address from which the work will be distributed. 
Institutional policy supports and provides guidelines for 
this activity, and the database management department 
will perform the task according to established instructions, 
which may include the use of a computer program or script 
to perform the function. 

Figure 4 illustrates a third example of the use of the 
operational model. In this case, the context is a multifor- 
mat metadata environment, and the function described is 
transformation, defined as the conversion of data from one 
metadata scheme to another. In this scenario, requests for 
transformation are sent to the contact person or address 
(which in this case might be a computer address) as the 
need arises to convert the data. Note that in a multiformat 
metadata environment, the contact is that person or com- 
puter address responsible for a particular catalog or collec- 
tion — not, as in the previous examples, for The Catalog. The 
department that will perform the work, in this hypothetical 
case, will be either a metadata services or database manage- 
ment unit, depending on the catalog and metadata format 
in question. The transformation will be carried out using a 
program or information service, with or without a certain 
amount of manual intervention, again depending on the 
target catalog or metadata format. Once again, institutional 
policy will support and provide guidelines for this activity. 
If the library decides that it will not support transformation 
of data for a given catalog or collection, because of limited 
resources or low prioritization of the activity, this particular 
piece of the overall metadata maintenance model will not be 



implemented. Nonetheless, the model signals the potential 
need for the work and how it can be accomplished, and 
prompts the policy question. 



Implementation of the Model 

From a purely conceptual point of view, this ISA-inspired 
model offers a framework on which to support a method for 
addressing potential metadata maintenance needs beyond 
those of simply keeping up the MARC-based OPAC. Many 
digital library collections have been built as one-shot enter- 
prises backed by grant money. After the digital images are 
created and the catalog metadata to describe those images 
has been loaded into the delivery system that will serve the 
collection, further editing of the metadata is often forsaken. 
That descriptive metadata might need to be corrected or 
enhanced over time is simply not part of most collection 
developers' or collection managers' mindset. Because the 
metadata for digital library collections is often derived from 
pre-existing MARC metadata, this oversight might initially 
seem a bit strange until one remembers that the managers 
of digital collections are not always technical services staff 
steeped in database maintenance practice and tradition. 10 

Using the operational model described above as a plan- 
ning and documentation tool, digital collection managers 
could work cooperatively across library service divisions 
to pose questions, assign responsibility, and develop policy 
for ongoing maintenance of the collections they oversee. 
Knowing which questions to ask, managers and administra- 
tors also could give themselves the option of deciding not 
to pursue a given maintenance function for certain collec- 
tions. For instance, if the descriptive metadata for a given 
digital collection has been derived from MARC records, 
and headings on the MARC records are subject to authority 
control updates (for example, when death dates are added 
to personal name headings or when Library of Congress 
subject headings are changed or split), collection managers 



Contact: 
Person/address who 
fields delete requests 




Policy: 
Institutional decisions/ 
guidelines 



Documentation, etc.: 
Program or instructions 
for performing deletions 



Maintenance 
Function: 
Transformation 



Contact: 
Person/address who 
fields these requests 



Department: 
Metadata Services or 
Database Management 




Policy: 
Institutional decisions/ 
guidelines 



Documentation, etc.: 
Program or info service 
to perform transformation 



Figure 3. An application of the model to the deletion function in 
a database maintenance context 



Figure 4. An application of the model to the transformation 
function in a metadata maintenance context 



52(1) LRTS 



An Operational Model for Library Metadata Maintenance 59 



may choose to update the descriptive metadata for the cor- 
responding records in the target digital collection as well. 
This work may be done manually, triggered perhaps by a 
routine report sent to the contact person or address in the 
model, or through the use of an automated script. 

The model also provides a conceptual framework for a 
Dublin Core metadata maintenance application profile to 
help manage automated and manual processes and augment 
collection/service registries to support maintenance within 
a local or shared system. 11 Plugging such a model into what 
are so far are chiefly object-centered digital registries could 
provide the basis for communication, operation, and policy 
protocols, even in a distributed environment. 12 For instance, 
after reviewing required and available resources for the 
metadata maintenance of a given digital collection, the 
institution(s) involved could approve those functions that it 
chooses to fund and hand the implementation of operational 
details off to a metadata services or database management 
group, which would in turn develop the workflows and 
record its decisions regarding what, why, who, when, where, 
and how, along with the entry for the target collection in the 
digital registry. Reports and scripts could then be developed 
to manage both the manual and automated aspects of the 
ongoing descriptive metadata maintenance. 

Conclusion 

The operational model for library metadata maintenance 
described here offers a simple scheme for organizing the 
resources involved in metadata catalog maintenance opera- 
tions, such as documentation, scripts, and contacts. The 
authors have sought simplicity in the model in order to 
ensure maximum flexibility for its use — whether merely 
to organize concepts and planning, to implement interde- 
partmental or interinstitutional workflows, or to develop 
automated scripts with which to identify and perform main- 
tenance tasks. Refinement of this model will involve clearer 
articulation of its potential use and identification of business 
cases for its implementation. 

It is in this regard that library technical services manag- 
ers may be able to leverage both the traditional skills of their 
catalog maintenance workforce and the potential applica- 
tions of the metadata maintenance model described in this 
paper to address the fundamental operational questions 
posed by Zachman for any complex or distributed work- 
force. Even when faced with limited resources for ongo- 
ing metadata upkeep, the key elements of this operational 
model can provide a framework for developing and dis- 
cussing workflow options and, by extension, can furnish an 



inventory of tasks for use in determining data maintenance 
priorities at an institutional or multi-institutional level. 

References and Notes 

1. Ruth A. Bogan, "Redesign of Database Management at 
Rutgers University Libraries," in Innovative Redesign and 
Reorganization of Library Technical Services: Paths for the 
Future and Case Studies, ed. Bradford Lee Eden, 161-77 
(Westport, Conn.: Libraries Unlimited, 2004). 

2. Ibid, 176. 

3. Ibid., 176. 

4. Marion T. Reid and David Fiste, "Catalog Maintenance: 
Manual to Machine," RTSD Newsletter 11, no. 1 (1986): 4-6. 

5. Ibid, 4-5. 

6. Ibid., 5. 

7. Elaine L. Westbrooks, "Remarks on Metadata Management," 
OCLC Systems ir Services 21, no. 1 (2005): 6. 

8. J. A. Zachman, "A Framework for Information Systems 
Architecture," IRM Systems Journal 26, no. 3 (1987): 276-92. 
For examples of adaptations of Zachmans framework, see 
R. Evemden, "The Information FrameWork," IRM Systems 
Journal 35, no. 1 (1996): 37-68; Edmund F. Vail III, "Causal 
Architecture: Bringing the Zachman Framework to Life," 
Information Systems Management 19, no. 3 (2002): 8-18; and 
Chun-Che Huang and Chia-Ming Kuo, "The Transformation 
and Search of Semi-Structured Knowledge in Organizations," 
Journal of Knowledge Management 7, no. 4 (2003): 106-23. 

9. See, for example, the hypothetical case involving the "Oz 
Car Registration Authority (OCRA)" in J. F. Sowa and J. A. 
Zachman, "Extending and Formalizing the Framework for 
Information Systems Architecture," IRM Systems Journal 31, 
no. 3 (1992): 590-616. 

10. For more on metadata maintenance issues surrounding 
the repurposing of MARC metadata for describing digital 
collections, see Martin Kurth, David Ruddy, and Nathan 
Rupp, "Repurposing MARC Metadata: Using Digital Project 
Experience to Develop a Metadata Management Design," 
Library Hi Tech 22, no. 2 (2004): 153-65. 

11. The authors proposed a data model for a Dublin Core meta- 
data maintenance application profile in Martin Kurth and Jim 
LeBlanc, "Toward a Collection-Based Metadata Maintenance 
Model," in Metadata for Knowledge and Learning: DC-2006, 
Proceedings of the International Conference on Dublin Core 
and Metadata Applications, ed. Myriam Cruz Calvario, 31-41 
(Colima: Universidad de Colima, 2006); also available in arXiv 
http://arxiv.org/abs/cs/0605022vl (accessed Feb. 16, 2007). 

12. Ibid., 38-40. Commentary on relevant object-centered 
registries include Christophe Blanchi and Jason Petrone, 
"Distributed Interoperable Metadata Registry," D-Lib 
Magazine 7, no. 12 (2001), www.dlib.org/dlib/december01/ 
blanchi/12blanchi.html (accessed Feb. 16, 2007); Stephen 
L. Abrams, "Establishing a Global Digital Format Registry," 
Library Trends 54, no. 1 (2005): 125-43. 



60 



LRTS 52(1) 



Notes on Operations 

Determining the Average Cost of 
a Book for Allocation Formulas 

Comparing Options 

By Virginia Kay Williams and June Schmidt 

Academic libraries that use allocation formulas to divide monographic funds 
among academic departments frequently include the average cost of books per dis- 
cipline as a variable. Published price indices provide average costs for some sub- 
jects, but for libraries serving interdisciplinary departments, purchasing nonbook 
materials with monographic funds, or purchasing foreign language materials, the 
published price indices may prove insufficient. This study investigates methods of 
determining average prices to be used in allocation formulas. As part of evaluating 
the allocation formula at Mississippi State University, the authors reviewed litera- 
ture pertinent to library use of allocation formulas, surveyed Carnegie Doctoral/ 
Research Extensive land grant university libraries on their use of average price 
as a variable in allocation formulas, and calculated allocations using average 
price data from four sources: The Bowker Annual, previous acquisition cost data, 
Blackwell Price Reports, and Blackwell approval plan profiles. The pros and cons 
of each method of determining average price are discussed. 



Virginia Kay Williams (gwilliams@library 
.msstate.edu) is Assistant Collection 
Development Officer, and June Schmidt 

(jschmidt@library.msstate.edu) is Asso- 
ciate Dean for Technical Services, 
Mississippi State (Miss.) University. 

The authors wish to thank Damen 
Peterson for assisting with correlation 
statistics. 



Submitted July 14, 2006; rejected with 
significant revisions suggested August 8, 
2007; revised and resubmitted November 
7, 2006; tentatively accepted pend- 
ing revision January 15, 2007; revised 
and resubmitted January 26, 2007, and 
accepted for publication. 



Some academic libraries consid- 
er an allocation formula helpful 
in equitably distributing budgetary 
resources for materials purchases and 
seek to include formula variables that 
reflect the needs and interests of the 
disciplines or departments among 
which the resources are divided. Many 
such libraries use the average cost of 
books per discipline as one of the fac- 
tors in the formula. The price variable 
is used as a proportion, to give depart- 
ments with relatively expensive titles 
a larger share of the available dollars 
than departments with relatively inex- 
pensive titles. If all other variables in 
the formula are equal, the price vari- 
able will allow the library to purchase 
the same number of titles per depart- 
ment, even though one department's 
titles tend to be much more expensive 
than another department's titles. 

Mississippi State University 
(MSU) Libraries use an allocation 
formula to allocate funds for mono- 
graphic purchases, and that formula 
historically included use of average 
price data from The Bowker Annual: 



Library and Book Trade Almanac 
(Bowker). 1 The current study began 
with the concern that the method 
MSU used for determining average 
price data was inadequate because 
the source data did not match the uni- 
versity's departmental structures well 
and did not address interdisciplinary 
materials, nonbook formats, or titles 
in languages other than English. The 
authors surveyed similar libraries on 
their use of average price as a variable 
in allocation formulas and calculated 
allocations using average price data 
from four sources to answer three 
research questions. First, what meth- 
ods do libraries at similar institutions 
use to determine average price data 
for allocation formulas? Second, to 
what extent do average book prices 
derived from Bowker correlate with 
other data sources? Third, what are 
the pros and cons of each method of 
calculating average price? 

The authors conducted a litera- 
ture survey to determine historical 
and current thinking regarding use of 
allocation formulas and the value of 



52(1) LRTS 



Determining the Average Cost of a Book for Allocation Formulas 61 



including price data in such formulas. 
Similar libraries were surveyed on 
their use of average price as a vari- 
able in allocation formulas. Finally, the 
authors calculated allocations using 
average price data from four sources 
and evaluated the pros and cons of 
each method. 



Background 

MSU Libraries have historically allo- 
cated a portion of the funding avail- 
able for monographic purchases to 
the university's academic departments. 
Selection of materials for each depart- 
ment's funds was made by departmen- 
tal faculty and the librarian assigned 
as liaison to the department. Until 
1992, allocations were made based 
on historical spending patterns with 
occasional adjustments to support new 
programs and accreditation needs. 
In 1992, the library faculty, in con- 
sultation with the University Library 
Committee, decided to implement a 
fund allocation formula using eight 
variables: undergraduate credit hours, 
undergraduate majors, graduate credit 
hours, graduate majors, average cost of 
book in discipline, publishing output, 
relative importance of books and seri- 
als, and local use. Average book cost, 
a critical factor in the formula, would 
be determined by identifying subject 
areas pertinent to each department 
and using average prices from "North 
American Academic Books: Average 
Prices and Price Indexes" published 
in Bowker. 2 

Determining the departments' 
average book costs in this manner was 
not without challenges. One of the 
most serious problems was that the 
subject breakdown used in Bowker 
did not reflect the curricular and 
research needs of a land grant insti- 
tution like MSU. For example, the 
term "Engineering and technology" 
was listed with no subdivisions to 
distinguish price differences among 
MSU's engineering departments. 



The problem was exacerbated by an 
increase in the number of interdisci- 
plinary departments; for example, the 
Geosciences Department includes 
programs in geography, geology, 
meteorology, and geographic infor- 
mation systems. 

Another problem was that data in 
the Bowker table represented hard- 
cover, trade, and paperback books. 
Belanger noted that the data used in 
the calculation of the average price 
of North American academic books 
is derived from titles included in the 
approval plans of Blackwell's Book 
Services and YBP, as well as books 
supplied through all order types from 
Baker and Taylor. 3 The increasing 
number of paperback books included 
in approval plans and sold through 
firm orders deflate the average prices 
and price indices in a manner not 
indicative of purchasing patterns typi- 
cal at the MSU Libraries. While a few 
departments prefer paperback when- 
ever available, most prefer hardcover 
almost exclusively. 

Another challenge was the prefer- 
ence by some departments to collect 
materials in nonbook formats and in 
non-English languages. For example, 
the Music Education Department is 
concentrating on expanding the librar- 
ies' collection of music recordings on 
compact discs while the Philosophy 
and Beligion Department requires 
numerous titles in German. Although 
Bowker published data on average 
costs of audiovisual materials, this 
information was omitted after the 
46th (2001) edition. 4 Foreign language 
titles represent a very small percent- 
age of the titles used to compute the 
average prices and price indices in 
Bowker. Cost coverage reports from 
the Blackwell and YBP Web sites sug- 
gest that about 2 percent of the titles 
included in approval plans are pub- 
lished in languages other than English. 
The MSU Libraries had made no 
attempt to adjust the average prices 
based on selection of nonbook media 
or foreign language materials. 



The University Library Com- 
mittee's periodic reviews of the for- 
mula had found no fault with the use 
of an average price variable, but the 
Libraries' collection development unit 
grew concerned with the appropri- 
ateness of determining the average 
price data for each department from 
the Bowker information. In 2004, the 
collection development unit chose 
to use average prices derived from 
"Approval Program Coverage and Cost 
Study" (Blackwell study) compiled by 
Blackwell's Book Services. 5 This study 
details subject areas and list prices of 
monographs covered by Blackwell's 
New Titles/Approval program for aca- 
demic library collections. The col- 
lection development unit customized 
price information by matching each 
academic department's approval slip 
profile with the pertinent subject dis- 
ciplines, a time-consuming procedure. 
For the 2005/06 academic year, the 
collection development unit decid- 
ed to explore options available for 
determining average prices using the 
most current data available from sev- 
eral sources. 



Literature Review 

Allocation formulas and their compo- 
nents have been widely discussed in 
library literature. Critics have pointed 
out that allocation formulas can lead to 
an aggregation of specialized materi- 
als and a lack of general interest and 
interdisciplinary materials; they also 
are often tied to faculty selection, and 
minimize the librarian's role in build- 
ing a balanced collection. 6 Proponents 
say that they can help distribute fund- 
ing equitably, minimize the tendency 
for rapid and vocal selectors to acquire 
a disproportionate share of funding, 
and cope with the political tensions of 
departments fighting for a fair share 
of materials funding.' While acknowl- 
edging the limitations of formulas, 
Lowry observed in 1992 that allo- 
cation formulas serve two purposes: 



62 Williams and Schmidt 



LRTS 52(1) 



they help allocate limited resources 
equitably and "are extremely useful 
in solving the dilemmas of unequal 
resource allocation." 8 Lowry proposed 
a matrix formula including as many as 
twelve variables chosen based on insti- 
tutional goals. Many libraries appar- 
ently agree with Lowry that formulas 
are useful since about 40 percent of 
academic libraries surveyed by Budd 
and Adams, and Tuten and Jones, have 
used formulas. 9 

The average cost of materials per 
discipline is a common, though not 
easily established, variable in alloca- 
tion formulas. As Genaway noted, 
the collection of an academic library 
should proportionally reflect all the 
programs, instruction, and research 
of the institution. 10 One could infer 
that subject fields receiving equitable 
library support would acquire the same 
proportion of titles published in their 
field. Since the average cost would 
significantly affect the library's ability 
to acquire titles, a variable to compen- 
sate for the broad variation in average 
prices would help insure proportional 
subject acquisitions. Biblarz, Bosch, 
and Sugnet noted that book price 
disparities among disciplines should 
be considered as a major piece of data 
that drives resource development in 
respective subject areas. 11 

Librarians have used numerous 
sources for average price information 
and noted problems with each. When 
Bandall discussed allocating book 
funds based on average publishing 
output and cost of books published in 
a subject, he relied on A List of Books 
for College Libraries as his source of 
information. 12 However, he acknowl- 
edged that he omitted two foreign lan- 
guage departments from the formula 
because his source contained too few 
priced titles in foreign languages to be 
considered reliable. 

Ellsworth tried using data from 
Publishers' Weekly and Bookseller to 
determine cost and publishing out- 
put by fields, but experienced dif- 
ficulty matching the publications' 



general subject categories to academ- 
ic departments. 13 He also noted that 
these sources did not cover foreign 
books. Sweetman and Wiedemann 
identified subject categories and 
restriction to books published in the 
United Kingdom as problems in using 
price data published in the Library 
Association Record. 14 When Werking 
and Getchell used Choice to estimate 
publishing output and cost, they iden- 
tified matching Choice categories with 
academic departments, lack of voca- 
tional literature, and Choice's focus on 
undergraduate materials as concerns. 15 

The costs compiled from approv- 
al plan vendor data and published 
in Bowker have been used by some 
libraries to determine average cost. 
When Axford compared average prices 
of books purchased through approval 
plans at three academic libraries to 
average prices reported in Bowker, 
he concluded that the differences 
between the published price index and 
the locally generated data were too 
large for Bowker to be used reliably 
for allocating book budgets among 
departments. 16 In a letter replying to 
Axford's article, Lynden and Birkel 
pointed out that the published indi- 
ces were intended to indicate overall 
price trends and provide data for state 
and national policy decisions, not to 
reflect a particular library's buying 
patterns. 17 Lynden and Birkel opined 
that published indices could be useful 
in preparing budget justifications, but 
could not substitute for local statistics. 

Bein discussed using data from 
the most recent Blackwell study. 18 He 
reported the challenge of matching 
the study's subject areas to the aca- 
demic departments of George Mason 
University. Bein noted that data from 
two tables had to be combined to 
cover all the subjects relevant to a 
department. Bein also suggested that 
the library should consider foreign 
publications and nonbook materials in 
future price studies. 

Cubberly also used approval plan 
data, noting the necessity of sort- 



ing the data by Library of Congress 
(LC) classification before matching 
it to department interests as identi- 
fied in the collection development 
policy. 19 Cubberly mentioned the lack 
of data on nonbook, foreign language, 
and retrospective materials as a con- 
cern, especially for the University of 
Southern Mississippi's large music 
program. Similarly, Goehner noted 
a faculty committee's recommenda- 
tion that the library and foreign lan- 
guage department at his university 
work together to identify a book cost 
for foreign language materials, as data 
was not available from the library's 
approval plan vendor. 20 

Bourgeois reported that South- 
west Texas State University uses a 
faculty-determined allocation formula 
that includes two price averages for 
monographs. 21 One average comes 
from vendor's tables and the other 
from the price of monographs bought 
by the library in previous years. As 
O'Connor explained, the University of 
Technology, Sydney, uses actual prices 
paid by the library to calculate average 
cost, because that reflects reality for a 
library purchasing substantial amounts 
of overseas publications. 22 Evans dis- 
cussed the decision to change from 
using average price of previous pur- 
chases to price indices published in 
Bowker when Monash University 
Library (AUS) adopted a new alloca- 
tion formula. 23 As Evans explained, 
the allocation formula committee felt 
that using average cost based on past 
purchases served to "enshrine past 
purchasing practice." 24 The commit- 
tee decided that adding the number of 
monographs published by subject and 
average price from a published index 
would address that concern. 

Research Method: Survey 
of Libraries Similar to MSU 

Because the literature review described 
situations in various types of aca- 
demic libraries, the authors selected 



52(1) LRTS 



Determining the Average Cost of a Book for Allocation Formulas 63 



forty-five Carnegie Doctoral/Research 
Extensive land grant colleges to survey 
about allocation formula use in insti- 
tutions similar to MSU. The survey 
which appears as an appendix, was 
developed to determine: 

• if the libraries use a formula 
to allocate all or a portion of 
their materials budget to aca- 
demic departments, and if so, 
what percentage is allocated to 
departments; 

• what formats are included in 
the formula; 

• if the average cost of books per 
discipline is used as a variable; 
and 

• how that average cost is deter- 
mined. 

The survey also included ques- 
tions regarding the amount spent 
annually on monographs and the por- 
tion of the monographic budget devot- 
ed to approval plans, if any. Twenty-six 
(58 percent) of the forty-five librarians 
responded. 

Survey Findings 

As table 1 shows, only eight (31 per- 
cent) of survey respondents reported 
using a formula to allocate all or a por- 
tion of the library's materials budget. 
A library's use of a formula did not 
appear to be related to the amount of 
money spent annually on monographs. 
The library with the largest mono- 
graphic budget ($5,000,000) and the 
one with the smallest ($241,500) both 
use allocation formulas. 

Half of the libraries using an allo- 
cation formula included price as one 
of the formula variables and half did 
not. One respondent commented that 
the difficulty of matching the library's 
fund and curricular structure with the 
book disciplines used in price indices 
led to a decision to omit price as a vari- 
able. Respondents mentioned using 
data from book vendors, Bowker, and 



local expenditure information from 
their integrated library systems to 
determine average price. 

Research Method: 
Comparison of Average 
Price Data Sources 

Since neither the literature review nor 
the survey of institutions similar to 
MSU pointed to a single best source 
for determining average book prices, 
the collection development unit chose 
to compare price information from 
four data sources: (1) approval plan 
profiles for each MSU department, 
(2) the Bowker study, (3) the Blackwell 
study, and (4) three-year average of 
local expenditures. 

The authors chose to use the 
most current data available from each 
source in June 2006 to calculate aver- 
age prices. The years for each data 
source varied, with the Bowker pric- 
es from 2004, Blackwell study from 
2004-2005, approval plan from 2005, 
and three-year average expenditure 
from 2003-2005. The difference in 
time period covered by each source 
makes a direct comparison of the aver- 
age prices inappropriate, but allocation 
formulas focus on the relationships 
between prices rather than the actu- 
al prices. The authors calculated the 
average price of books for each depart- 
ment from the most current available 
data for each source, as using the most 
current data is normal practice. 

After the average price of books 
for each department was calculated 
from the four data sources, eleven 
departments were chosen for further 
study. The departments represent five 
of MSU's seven colleges, with the 
College of Veterinary Medicine and 
the College of Architecture, Art, and 
Design omitted because of their highly 
specialized content needs. Other crite- 
ria determining selection included: 

• representation of several disci- 
plines within one department 



resulting from the merger 
of two or more departments 
(counseling, educational psy- 
chology, and special education; 
geosciences; human sciences); 

• a single department's reliance 
on resources from multiple dis- 
ciplines (agricultural informa- 
tion science, biochemistry and 
molecular biology, entomology 
and plant pathology, industrial 
engineering); 

• a significant percentage of non- 
book or non-English language 
purchases (music education, 
philosophy and religion); and 

• interesting discrepancies among 
the average prices from the 
various data sources (English, 
marketing). 

Approval Plan Profiles 

Approval plan profiles for depart- 
ments were established with Blackwell 
Book Services in the early 1990s. 
Departments are offered the opportu- 
nity to update their profiles annually. 
If the profiles are updated regularly, 
the approval plan data should closely 
reflect the average price of hardcover, 
English-language books of interest to 
the department. Use of the profiles 
eliminated the need to match depart- 
ments with subjects or LC classifica- 
tion ranges. The library did not receive 
books on approval in 2005, but the 
profiles generated electronic approval 
forms for librarians and department 
faculty to review for purchase. The col- 
lection development unit ran reports in 
Blackwell's Collection Manager to com- 
pile approval titles identified during 
2005 by each department's profile into 
Excel spreadsheets. The total number 
of titles and associated prices generat- 
ed by each departmental profile were 
calculated, and an average price was 
determined for each department. The 
unit spent thirty-seven hours creat- 
ing departmental approval plan reports 
from Collection Manager and calculat- 
ing averages from the reports. 



64 Williams and Schmidt 



LRTS 52(1) 



Table 1. Responses to survey on library allocation formulas (N=26 unless noted) 





Yes 


No Total 


Does library use a formula to allocate materials budget? 


8 


18 26 


If yes, which formats are included in the formula? 


8 


18 26 


Formats included: (iV=8) 






OCllcllS 


-i 
j 


J o 


Monographs 


8 


8 


Electronic resources 


2 


6 8 


Audiovisual 


4 


5 8 


Other 





8 8 


Is the average cost of a book per discipline 
one of the variables? (N=8): 


5 


3 8 


How much does the library spend annually on monographs? 






Less than $1,000,000 




10 


$1,000,000 to $2,000,000 




9 


$2,000,000 and above 




7 


Is a portion of monographic budget devoted to approval plan? (JV=24; two respondents answered 
"no.") 


Less than 30 percent 




9 


Between 30 percent and 50 percent 




9 


50 percent and above 




6 



Is a portion of your monographic budget allocated to academic departments? (iV=13; 6 respondents 
answered "no" and 7 qualified answers with phrases such as "to subject librarians.") 

Less than 35 percent 4 
Between 35 percent and 70 percent 5 
70 percent and above 4 



Bowker Annual 

Average prices were determined for 
each department from the 2004 prices 
listed in the 2006 Bowker Annual. 
The two-year delay in reported prices 
is a factor to be considered in using 
Bowker as a source for average price 
data. Subjects were matched to depart- 



ments based on the LC classification 
numbers assigned to each department 
by the collection development unit. 
When more than one subject was 
appropriate for a department, price 
data and number of titles for each sub- 
ject was used to calculate an average 
price for the department. Calculating 
average prices from Bowker required 



two hours, much of which was spent 
deciding how to match subjects to 
departments. 

As table 2 shows, seven of eleven 
MSU departments were matched with 
more than one subject from the price 
index published in Bowker. Among 
the seven departments was the School 
of Human Sciences, which includes 
programs in fashion design and mer- 
chandising, family and consumer sci- 
ence education, and family and youth 
studies. Four Bowker subjects are 
required to cover the Human Sciences 
topics, with a price range from $28.69 
to $62.30. Even the more traditional 
Industrial Engineering Department 
acquires materials in disciplines as 
diverse as engineering and business. 

In cases where the collection 
development unit was able to match a 
department with a single subject, the 
Bowker study is still not ideal for every 
department at MSU because its cover- 
age focuses on books published or dis- 
tributed in North America and written 
almost exclusively in English; less than 
2 percent of titles used in compiling the 
Bowker price index are in non-English 
languages. 25 These factors would be 
problematic for departments selecting 
a significant percentage of materi- 
als in nonbook format or written in 
non-English languages. For example, 
during the fiscal years 2003 through 
2005, 12 percent of items acquired for 
Philosophy and Religion were in lan- 
guages other than English. During the 
same time period, 58 percent of the 
items acquired for Music Education 
were nonbook formats. 

Blackwell 

Since the broad subjects in Bowker do 
not accurately reflect MSU's depart- 
ments, the collection development unit 
matched departments to the eight- 
digit LC table of the Blackwell study, 
one of the underlying sources of the 
Bowker data. The LC classifications 
had been assigned to each depart- 
ment previously for collection evalua- 



52(1) LRTS 



Determining the Average Cost of a Book for Allocation Formulas 65 



tion projects. The number of titles and assigned to a department were added; 
total price for each classification range then the department's total price was 



Table 2. Average prices from Bowker (2004) 



Departments 


Bowker subjects 


Subject 
average 
price ($) 


NO. OT 

titles 


Department 
average 
price ($) 


Agricultural 
information 
systems 


Agriculture 
Education 


68.45 
46.45 


1,057 
2,536 


52.92 


Biochemistry and 
molecular biology 


Chemistry 
Zoology 


166.26 
91.56 


450 
1,765 


105.24 




Science (general) 


95.60 


343 




Counseling, 
educational 
psychology, and 
special education 


Education 


46.45 


2,536 


46.45 


English 


Literature and 
language 


33.33 


15,242 


33.33 


Entomology and 
plant pathology 


Science (general) 


95.60 


343 


95.60 


Geosciences 


Geography 


69.05 


699 


74.66 




Geology 


92.33 


222 




Human sciences 


Business and 
economics 


70.21 


5,900 


59.36 




Fine and applied 
arts 


48.41 


3,728 






Industrial arts 


30.13 


230 






Home Economics 


34.07 


652 




Industrial 
engineering 


Engineering and 
technology 


100.09 


4,933 


83.82 




Business and 
economics 


70.21 


5,900 




Marketing 


Business and 
economics 


70.21 


5,900 


70.21 


Music Education 


Education 


46.45 


2,536 


47.62 




Fine and applied 
arts 


48.41 


3,728 




Philosophy and 
religion 


Philosophy and 
religion 


48.63 


5,026 


48.63 



divided by its total number of titles 
to determine the average price from 
the Blackwell study. The unit spent 
13.5 hours calculating average prices 
from the Blackwell study, including 
converting the Blackwell study eight- 
digit LC table to a spreadsheet, iden- 
tifying matching classification ranges 
for each department, and performing 
price calculations. 

While the Blackwell study's LC 
classification ranges do not precisely 
match those used by MSU, the varia- 
tions are slight in most instances. For 
example, the Blackwell ranges used for 
Biochemistry and Molecular Biology 
were much broader than those needed. 
Using the Blackwell study allowed the 
collection development unit to match 
subjects and departments closely, but 
in areas with small publication outputs 
the validity of the data may be doubt- 
ful. Of the 11 departments in this 
study, Human Sciences had the lowest 
output with 52 titles, and English had 
the highest with 5,221 titles. Using 
the Blackwell study data also does 
not address concerns with lack of 
pricing data for non-English language 
books and nonbook materials because 
98 percent of titles included are in 
English and no nonbook materials 
are included. 

Local Expenditures 

Because published price indices such 
as Bowker do not adequately cover 
non-English and nonbook materials 
and do not reflect vendor discounts, 
the collection development unit con- 
sidered using the library's expendi- 
tures to determine average prices. 
Average monographic expenditures 
were calculated from the integrated 
library system by dividing the total 
expenditures by the number of titles 
purchased from each department's 
monograph fund. To investigate the 
extent to which average expenditures 
fluctuate, averages were calculated 
for each of the three most recently 
completed fiscal years (2003, 2004, 



66 Williams and Schmidt 



LRTS 52(1) 



2005) and for the three-year period 
2003 through 2005. As seen in table 
3, the average expenditure by depart- 
ment varies substantially from year 
to year. 

Acquiring journal backfiles, a 
multivolume title, or an important 
but expensive out-of-print title can 
cause the average expenditure to be 
unusually high one year. Timing of 
requests may affect expenditures for 
a department, as acquisitions staff 
may not have time to search for ven- 
dors offering the best discounts when 
selectors submit many requests late 
in the fiscal year. Another problem 
with using average expenditures from 
a single year is the small number of 
monographic titles acquired to sup- 
port departments that rely heavily 
on serials or that have relatively low 
credit hour production. Using a three- 
year average expenditure helps to 
level these fluctuations. Determining 
average expenditure by department 
required slightly less than one hour, 
including time to collect the data for 
each of three fiscal years and calcu- 
late a three-year average. 

Comparison of Average 
Price Data Sources: Findings 

The approval plan and Blackwell study 
methods tended to produce higher 
average prices, while the Bowker and 
expenditure methods tended to pro- 
duce lower average prices (see table 
4). Lower prices were expected from 
Bowker because the data reflected 
two-year old prices. The expendi- 
ture method produced lower prices 
because it reflected discounts received 
by the library, while other methods 
reflected list prices. Allocation for- 
mulas rely on the proportional differ- 
ences in average book price among 
departments, not the absolute price. 
While the various methods produced 
differing average prices, the authors 
wanted to determine if all methods 
produced similar proportional differ- 
ences among the departments. 



The authors compared the four 
pricing methods using Pearson correla- 
tions (r) (see table 5). All four methods 
are significantly related to each other 
at the p<0.05 level (n=ll, two-tailed). 
The Blackwell study and approval plan 
prices were the most closely related; 
the correlation between them was sig- 
nificant at the p<0.01 level (n=ll, 
two-tailed). Significant correlations 
between the data were expected, as all 
four methods are attempts to measure 
the same variable, the average price 
of monographic materials in a specific 
academic field. 

The authors computed the coeffi- 
cient of determination (r[2]) to deter- 
mine the shared variability among the 
methods. If the four methods are pro- 
ducing very similar proportional dif- 
ferences in average prices, one would 
expect the shared variability to be 



Department 


2003 ($) 


Agricultural information systems 


44.41 


Biochemistry and molecular biology 


121.37 


Counseling, educational psychology, 


43.82 


and special education 




English 


35.89 


Entomology and plant pathology 


82.67 


Geosciences 


75.85 


Human sciences 


80.76 


Industrial engineering 


83.06 


Marketing 


38.23 


Music education 


22.59 


Philosophy and religion 


48.18 



very close to 1.00. As shown in table 
6, the shared variability ranges from 
0.46 between the approval plan and 
expenditure methods to 0.83 between 
the Blackwell study and approval plan 
methods. Although the average prices 
calculated by all four methods are sig- 
nificantly correlated, the coefficient of 
determination shows that the source 
of the price data does make a differ- 
ence in the proportional difference 
among the average prices, and in the 
department's allocations. 

Because none of the coefficients 
of determination are very close to 
1.00, the methods of determining 
average price are not producing nearly 
interchangeable proportional differ- 
ences. The MSU Libraries will need to 
consider all the pros and cons of each 
method to select the most appropriate 
method for local use. 



2004 ($) 


2005 ($) 


Three - 
year ($) 


61.60 


41.85 


47.30 


79.26 


68.85 


80.82 


42.47 


79.35 


53.53 


25.15 


23.53 


26.27 


98.27 


97.08 


91.56 


62.64 


71.66 


70.46 


50.92 


46.77 


55.22 


86.57 


84.37 


84.30 


59.59 


43.09 


46.10 


20.16 


37.72 


24.74 


25.80 


49.30 


38.55 



Table 3. Average cost of monograph (2003-2005) 



52(1) LRTS 



Determining the Average Cost of a Book for Allocation Formulas 67 



Table 4. Comparison of average prices computed by four methods 



Three-year average 
Bowker average expenditure Blackwell study 

(2004) ' (2003-2005) (2004-2005) Approval plan (2005) 

Total Total Average Total Total Average Total Total Average Total Total Average 
Department cost ($) titles price ($) cost ($) titles price ($) cost ($) titles price ($) cost ($) titles price ($) 

Agricultural 190,149 3,593 52.92 4,446 94 47.30 3,627 56 64.77 8,282 931 94.82 

information 

systems 

Biochemistry 269,211 2,558 105.24 4,526 56 80.82 31,945 220 145.20 42,730 294 145.34 

and molecular 

biology 

Counseling, 117,797 2,536 46.45 7,601 142 53.53 16,663 317 52.56 233,599 3,699 63.15 

educational 

psychology, 

and special 

education 

English 508,016 15,242 33.33 59,285 2257 26.27 204,668 5,221 39.20 284,975 5,901 48.29 



Entomology 105,142 1,400 75.10 4,944 54 91.56 8,235 88 93.58 16,738 163 102.69 

and plant 

pathology 

Geosciences 68,763 921 74.66 21,701 308 70.46 31,865 348 91.57 70,145 693 101.22 



Human 623,855 10,510 59.36 5,025 91 55.22 3,128 52 60.16 201,466 2,733 73.72 

sciences 



Industrial 907,984 10,833 83.82 7,081 84 84.30 64,368 906 71.05 113,030 1,271 88.93 

engineering 



Marketing 414,239 5,900 70.21 10,003 217 46.10 13,015 177 73.53 279,541 2,460 113.63 



Music 298,270 6,264 47.62 7,842 317 24.74 44,533 850 52.39 30,684 591 51.92 

education 



Philosophy 244,414 5,026 48.63 8,095 210 38.55 5,756 100 57.56 298,554 3,896 76.63 

and religion 



68 Williams and Schmidt 



LRTS 52(1) 



Implications for 
Other Libraries 

Some factors considered by MSU that 
may be pertinent to other libraries 
considering sources for price data 
included difficulty of matching source 
subjects to departments, inclusion of 
non-English and nonbook materials or 
of paper bindings, currency of price 



data, and staff time needed to arrive 
at price averages. Table 7 summa- 
rizes factors that might be taken into 
account by MSU and other libraries 
with similar programs. 

Two of the methods, expenditure 
and approval, do not require matching 
the source data's subject divisions to 
departments. The expenditure meth- 
od uses the prices of titles specifically 



Table 5. Pearson correlation matrix for price comparisons (/V=l 11) 



Approval Plan 
Blackwell 

Three-year expenditure 



Bowker 

0.76** 

0.89** 
0.86** 



Three-year 
expenditure 

0.68** 
0.72** 



Blackwell 

0.91* 



*p<.01, two tails: **p<.05, two tails 



Table 6. Coefficient of determination (r(2)) matrix for price comparisons 





Bowker 


Three-year 
expenditure 


Blackwell 


Approval plan 


0.58 


0.46 


0.83 


Blackwell 


0.80 


0.52 




Three-year expenditure 


0.74 







Table 7. Advantages and disadvantages of four methods of computing average price 





Bowker 


Three-year 
expenditure 


Blackwell 
Study 


Approval 
Plan 


Requires matching departments to subject 
of LC classification 


Yes 


No 


Yes 


No 


Non-English materials included 


<2% 


In proportion 
to purchases 


About 2% 


No 


Nonbook materials included 


No 


In proportion 
to purchases 


No 


No 


Staff time required to compile at MSU 


2 hours 


1 hour 


13.5 hours 


37 hours 


Paper bindings included 


Yes 


Yes 


Yes 


No 


Most current data available in June 2006 


2004 


2003-2005 


2004-2005 


2005 



selected and purchased to support 
the departments' programs. However, 
the library will need an alternative 
way to determine average price when 
new programs are established or 
departmental programs change dra- 
matically. The approval method uses 
existing profiles associated with the 
library's approval or new book notifica- 
tion plans that already match subjects 
to departments. Libraries should be 
aware of how non-subject limits on 
the approval profile may affect the 
average price. For example, approval 
plans can be limited to university press 
titles or to books without media, which 
may produce average prices that differ 
from the average prices of books from 
all publishers or in all formats. 

The Bowker and Blackwell study 
methods both require matching the 
source data's subject divisions to 
departments. Bowker data are divided 
into broad subjects, so the library may 
need to use the same price data for 
several departments. Interdisciplinary 
subjects may need to be matched to 
several Bowker subjects. The Blackwell 
study data is divided into more than 
one thousand subject divisions based 
on LC classifications, allowing close 
matches between the departments' 
interests and the subject divisions. 
The library may consider the 
inclusion of price data 
^^^^^^ on non-English, non- 
book, or paper bindings 
important. A very small 
proportion of titles in 
the Blackwell study and 
Bowker data are non- 
English, while approval 
plans may exclude non- 
English materials com- 
pletely. The Bowker and 
Blackwell study data do 
not include nonbook 
materials, and approval 
plans may exclude non- 
book materials as well. 
The Bowker data include 
paper bindings, while the 
Blackwell study presents 



52(1) LRTS 



Determining the Average Cost of a Book for Allocation Formulas 69 



data on cloth and paper bindings sepa- 
rately, and libraries can chose whether 
to include paper bindings in approval 
profiles. The expenditure method has 
the advantage of including price data 
in the same proportion as non-English 
materials, nonbook media, and paper 
bindings are purchased. When non- 
English materials or nonbook media 
are a significant portion of the library's 
purchases, the library should seriously 
consider using average expenditure 
in the allocation formula. Libraries 
also should consider whether the 
data source they use for determining 
average price matches the library's 
preference for acquiring paper or 
cloth bindings. 

Currency of data also may be a 
consideration for libraries. The MSU 
comparison used the most recent data 
available for each source at the time 
the study was conducted. The Bowker 
data was for books published in 2004, 
the Blackwell study for books published 
between July 2004 and June 2005, the 
approval plan for books published in 
2005, and the expenditure method 
for books purchased in 2003-2005. 
Allocation formulas rely on the pro- 
portional differences in average book 
price among departments as opposed 
to the absolute price, so the currency 
of the data may not be important to 
the library. When the library uses 
the same average price data for other 
purposes, such as budget requests, 
currency may be a factor. 

Finally, the staff time required 
to compile the data and compute 
average price should be considered. 
The expenditure and Bowker meth- 
ods required minimal staff time. The 
Blackwell study method was more 
time-consuming, but had the advan- 
tage of allowing the library to match 
subjects to departments more closely 
than Bowker allows. The Blackwell 
study also reflects the English-lan- 
guage academic book market, while 
expenditures reflect only a small por- 
tion of the titles available for purchase. 
The approval plan method was very 



time-consuming, a major disadvantage 
for most libraries. 



Suggestions for Further Study 

Libraries use price data in many ways, 
such as allocating funding among 
departments or broad subjects, prepar- 
ing budgetary projections, and assess- 
ing charges for lost materials. The 
Library Materials Price Index Group 
(LMPI) has noted that the Bowker 
indices can serve as useful bench- 
marks against which local costs can be 
compared, but they cannot substitute 
for cost data that reflect collecting pat- 
terns of individual libraries. 26 LMPI 
has indicated an interest in pursuing 
studies that correlate individual librar- 
ies' costs with the national prices. 
This study found a significant cor- 
relation between MSU expenditures 
supporting specific departments and 
the Bowker price index. Expanding 
the study to include all MSU depart- 
ments would be necessary to validate 
this finding, and could serve as a 
case study for other libraries consider- 
ing the most appropriate method for 
determining average price data for use 
in allocation formulas. 



References 

1. Janet Belanger Morrow, "Prices of 
U.S. and Foreign Published Materials" 
in The Bowker Annual: Library and 
Book Trade Almanac, 51st ed., ed. 
Dave Bogart, 495-515. (New York: 
Information Today, 2006). 

2. Ibid. 

3. Ibid. 

4. Sharon G. Sullivan, "Prices of U.S. 
and Foreign Published Materials" in 
The Bowker Annual: Library and 
Book Trade Almanac, 46th ed., ed. 
Dave Bogart, 459-484. (New York: 
Information Today, 2001). 

5. Blackwell Book Services US Approval 
Coverage and Cost Study, July 1, 
2004-June 30, 2005, www.blackwell 
.com/librarian_resources/cost_and 
_coverage (accessed Jan. 31, 2006). 



6. Theodore W. Koch, "The Ap- 
portionment of Book Funds in 
College and University Libraries," 
ALA Bulletin 2, no. 3 (1908): 341-47; 
Harry Bach, "Why Allocate?" Library 
Besources ir Technical Services 8, 
no. 2 (1964): 161-64; Richard Hume 
Werking, "Allocating the Academic 
Library's Book Budget: Historical 
Perspectives and Current Reflections," 
Journal of Academic Librarianship 
14, no. 3 (1988): 140-44; John M. 
Budd, "Allocation Formulas in the 
Literature: A Review," Library 
Acquisitions: Practice it Theory 15, 
no. 1 (1991): 95-107. 

7. Charles M. Baker, "Apportioning 
of College and University Library 
Book Funds," Library Journal 57, 
no. 3 (Feb. 1932): 166-67; Peter 
Sweetman and Paul Wiedemann, 
"Developing a Library Book-Fund 
Allocation Formula," Journal of 
Academic Librarianship 6, no. 5 
(1980): 268-76; Donna Packer, 
"Acquisitions Allocations: Equity, 
Politics, and Formulas," Journal of 
Academic Librarianship 14, no. 5 
(1988): 276-78; Werking, "Allocating 
the Academic Library's Book Budget"; 
Budd, "Allocation Formulas in the 
Literature: A Review." 

8. Charles B. Lowry, "Reconciling 
Pragmatism, Equity, and Need in 
the Formula Allocation of Book and 
Serial Funds," College I? Besearch 
Libraries 53, no. 2 (Mar. 1992): 124. 

9. John M. Budd and Kay Adams, 
"Allocation Formulas in Practice," 
Library Acquisitions: Practice ir 
Theory 13, no. 4 (1989): 381-90; Jane 
H. Tuten and Beverly Jones, Allocation 
Formulas in Libraries: CLIP Note 22 
(Chicago: ACRL, 1995). 

10. David C. Genaway, "PBA: Percentage 
Based Allocation for Acquisitions: A 
Simplified Method for the Allocation 
of the Library Materials Budget," 
Library Acquisitions: Practice ir 
Theory 10, no. 4 (1986): 287-92. 

11. Dora Biblarz, Stephen Bosch, and 
Chris Sugnet, eds., Guide to Library 
User Needs Assessment for Integrated 
Information Besource Management 
and Collection Development, Col- 
lection Management and Devel- 
opment Guides no. 11 (Lanham, 
Md.: Scarecrow; Chicago: Association 



70 Williams and Schmidt 



LRTS 52(1) 



for Library Collections & Technical 
Services, 2001). 

12. William M. Randall, "The College 
Library Book Budget," Library 
Quarterly 1, no. 4 (1913): 421-35. 

13. Ralph E. Ellsworth, "Some Aspects of 
the Problem of Allocating Book Funds 
Among Departments in Universities," 
Library Quarterly 12, no. 3 (July 
1942): 486-94. 

14. Sweetman and Wiedemann, "Devel- 
oping a Library Book-Fund Allocation 
Formula." 

15. Richard Hume Werking and Charles 
M. Getchell Jr., "Using Choice As 
a Mechanism for Allocating Book 
Funds in an Academic Library," 
College & Research Libraries 42, no. 
2 (Mar. 1981): 134-38. 

16. H. William Axford, "The Validity of 
Book Price Indexes for Budgetary 
Projects," Library Resources ir 



Technical Services 19 (Winter 1975): 
5-12. 

17. Fred C. Lynden and Paul E. Birkel, 
"Book Price Indexes," Library 
Resources i? Technical Services 20, 
no. 1 (Winter 1976): 97-98. 

18. Laura O. Rein et al., "Formula- 
Based Subject Allocation: A Practical 
Approach," Collection Management 
17, no. 4 (1993): 25-48. 

19. Carol Cubberley, "Allocating the 
Materials Funds Using Total Cost 
of Materials," Journal of Academic 
Librarianship 19, no. 1 (1993): 
16-21. 

20. Donna M. Goehner, "Allocating by 
Formula: The Rationale from an 
Institutional Perspective," Collection 
Management 5, no. 3/4 (1983): 
161-73. 

21. Eugene Bourgeois et al, "Faculty- 
determined Allocation Formula at 



Southwest Texas State University," 
Collection Management 23, no. 1/2 
(1998): 113-23. 

22. Steve O'Connor, Susan Lafferty, 
and Ann Flynn, "The Materials 
Acquisition Process at the University 
of Technology, Sydney: Equitable 
Transparent Allocation of Funds," 
Australian Academic and Research 
Libraries 29, no. 1 (1998): 23-33. 

23. Merran Evans, "Library Acquisitions 
Formulae: The Monash Experience," 
Australian Academic and Research 
Libraries 27, no. 1 (1996): 47-57. 

24. Ibid., 53. 

25. Stephen Bosch, e-mail message to 
June Schmidt, June 14, 2006. 

26. Morrow, "Prices of U.S. and Foreign 
Published Materials." 



Appendix. Survey Questions on Library Allocation Formulas 

1. Does the library use a formula to allocate the materials budget (all or a portion) among academic departments? 
If not, skip to question 5. 

2. If so, please indicate all formats that are included in the formula: 

a) serials b) monographs c) electronic resources d) audiovisual e) other please explain other: 

3. Is the average cost of a book per discipline one of the variables? 

4. If so, how do you determine the average cost? 

5. How much does the library spend annually on monographs? 

6. Is a portion of your monographic budget devoted to an approval plan? If so, what percentage? 

7. Is a portion of your monographic budget allocated to academic departments? If so, what percentage? 

8. Would you like us to e-mail you a summary of survey results? 



52(1) LRTS 



71 



Book Review 

Edward Swanson 



IFLA Cataloguing Principles: Steps towards an International 
Cataloguing Code, 3: Report from the 3rd IFLA Meeting of 
Experts on an International Cataloguing Code, Cairo, Egypt, 
2005. Eds. Barbara B. Tillett, Khaled Mohamed Beyad, and 
Ana Liipe Cristan. Munich: K. G. Saur, 2006. 197 p. $109 
(IFLA members $81) cloth (ISBN 3-598-24278-6). IFLA 
Series on Bibliographic Control, vol. 29. 

This volume is the third in a series documenting the 
proceedings of the regional IFLA Meetings of Experts on 
an International Cataloguing Code (IME ICC). The previ- 
ous meetings were held in Frankfurt, Germany, in 2003, for 
the European and North American experts, and in Buenos 
Aires, Argentina, in 2004, for the Latin American region. 
The reports of those meetings were reviewed in LRTS vol- 
ume 49, number 3, and volume 50, number 4, respectively. 

The Cairo meeting was attended by cataloging experts 
from the Arabic-speaking world, and many of the papers 
are presented in both English and Arabic. The presentation 
papers have been carried over from the previous volumes, 
but in somewhat updated versions. Even though they are 
more background than directly part of the international cat- 
aloging principles, John Byrum's paper on the International 
Standard Bibliographic Description (ISBD), delivered by 
Mauro Guerrini, Patrick Le Boeuf 's paper on the Functional 
Bequirements for Bibliographic Becords (FBBB), delivered 
by Elena Escolano Bodriguez, and Barbara Tillett's paper 
on the virtual international authority file should be useful 
to anyone looking for a succinct summary of the principles 
of these concepts that underlie the structure of our present 
and future catalogs. 

The statement of principles is a work in progress, and 
several drafts of the principles are presented in the book. It 
is sometimes hard to understand the relationships between 
the drafts, but the latest one is a post-meeting draft from 
April 2006, after voting and discussion among the Middle 
Eastern participants and subsequent voting by participants 
in the previous two meetings. This draft rectifies some of the 



problems seen in the report of IME ICC2, but it still has a 
discussion of corporate body access points that sounds a very 
similar to the main entry rules in 9.1 of the Paris Principles, 
even though the new principles otherwise avoid the ques- 
tion of main entry. Both in section 1, "Scope," and in the 
appendix on objectives for the construction of cataloging 
codes, catalog user convenience is claimed as the highest 
principle, which is a laudable goal, but there is little guid- 
ance on what this means and no apparent recognition of the 
fact that there are many kinds of users. 

As in the previous meetings, there were working groups 
set up for personal names, corporate names, seriality, uni- 
form titles, and the general material designation as well as 
for multivolume or multipart structures. Summaries of their 
discussions and recommendations are presented. There was 
not a lot new to be added in this third round of work on 
the principles, but it was striking how many difficult issues 
are presented by different languages and scripts, even in 
countries that have long been working with AACB2 and the 
ISBDs. A glossary is being developed to go along with the 
statement, and at least one working group noted the need 
for a definition for "persona," which seems especially neces- 
sary given the confusion that seems to have revolved around 
this term for the different bibliographical identities recog- 
nized by some cataloging rules, including AACB2. 

As with the predecessor volumes, the value of this book 
lies in its documentation of the process of developing a 
statement of principles to replace the 1961 Paris Principles 
and making recommendations for a possible future interna- 
tional cataloging code. A fourth meeting was held for Asian 
experts in Seoul, South Korea, in 2006, and the fifth and last 
meeting for sub-Saharan African experts in Durban, South 
Africa, in 2007. A final draft of the statement is expect- 
ed to be sent out for worldwide review in 2008. — John 
Hostage (hostage@law.harvard.edu), Harvard Law School, 
Cambridge, Mass. 



p/u Library Technologies 

LRTS 5 1 n4 p308 



Canada Institute tor 
Scientific and Technical 
Information 






Connect first to 

NRC-CISTI Librarv Services 



and seamlessly expand your information world 



Come to us for speedy access to a network of information 
professionals and resources. Discover one of the largest 
collections of scientific, technical, engineering and 
medical documents in the world dating back to 1 521 . 



Fulfill your information needs with our fast and 
reliable document delivery service options and our 
new e-tools. Formal accounts are not required for 
our new eBook Loan and Pay Per Article services. 



1-800-668-1222 (Canada & U.S.) 
1-613-998-8544 (international) 



Contact us to get connected! 



info.cisti@nrc-cnrc.gc.ca 
www.cisti.nrc.gc.ca 



l^bl National Research Conseil national 

Council Canada de recherches Canada 



Canada 





Association for Library Collections & Technical Services 

a division of the American tibrary Association 

50 East Huron Street, Chicago, It 60611 • 312-280-5038 

fax: 312-280-5033 • Toll free: 1-800-545-2433 • www.ala.org/alcts 




